Threaded Mode | Linear Mode

komame · (This post was last modified: 09-15-2023 01:24 PM by komame.)

Hello Tyann,

(09-14-2023 06:05 PM)komame Wrote: I also have some ideas, but in my case, when the first letters of the word are provided, the algorithm is already so fast that there's not much to gain (maybe a few dozen milliseconds). It's definitely worse when there's a "?" at the beginning. Perhaps in such a case, there's still room for optimization.

I just managed to introduce additional optimizations for pattern searches with "?" at the beginning (speedup depends on the case but ranges from 5% to 50%). In the process, I also found a mistake I accidentally made while implementing one of the previous optimizations, which caused incorrect searches for 2-letter words, and I fixed that as well. I replaced the file with the new version 1.00 beta7 in the original post: https://www.hpmuseum.org/forum/thread-20...#pid177326

Best regards,
Piotr

komame · 09-17-2023, 08:09 AM

Hello,

Version 1.00 beta8 released!
https://www.hpmuseum.org/forum/thread-20...#pid177326

This is almost the final version now, as I'm running out of ideas for optimizations. The current results are as follows (measurements only for G2):

??e??r??? --> 386 words found in 5.6 - 7.6s (previously 9 - 11s)
??e??r??s --> 130 words found in 5.7 - 7.7s (previously 7.4 - 8.7s)
??e?????s --> 1031 words found in 5.8 - 7.3s (previously 8.6 - 12s)

p?e??r??? --> 93 words forund in 1.3 - 2.0 (previously 2.6s)
sc???r??? --> 9 words found in 1.0 - 1.3 (previously 1.2s - 1.4s)

I'm considering making this tool with an English word dictionary, which, due to the absence of accents, should be much faster and more useful for many people on this forum.

Best reagrds.

jte · 09-17-2023, 08:52 PM

(09-13-2023 12:09 PM)komame Wrote: Hi Jeff,

(09-12-2023 10:47 PM)jte Wrote: There’s certainly many algorithms for subset checking, string searching, etc. but one thing this reminded me of was my dad’s program (that he did for a comp. sci. assignment in the ’60s, on punched cards — he told me about it several times…) for handling pizza toppings. His idea was to encode each pizza topping option as a prime so that a selection of pizza toppings could be encoded as a product of those selected.
My respects to your dad. In those times, with such limited possibilities, coming up with and additionally implementing something like that was certainly a great achievement.
⋮

Thanks. One exploit of his that he shared with me, when I was taking a numerical course, was that he implemented a Runge-Kutta method for coursework that was one degree higher than what the assignment required. (A perhaps more intricate bit of coding.) (That conversation started with something along the lines of “are they still doing Runge-Kutta?” I’m not sure if “exploit” is the best word there — it replaced something involving “obscure bragging” Big Grin

.)

One of his development tricks he passed on to me more than once was his technique for doubling his throughput on the computer back then. His fellow students would typically get one program run in per day, but he realized that he could submit a second job late in the day and then pick up the results first thing in the morning. (So that the results could be gone over before submitting a job midday.)

jte · 09-17-2023, 09:13 PM

(09-17-2023 08:09 AM)komame Wrote: ⋮
This is almost the final version now, as I'm running out of ideas for optimizations. The current results are as follows (measurements only for G2):

??e??r??? --> 386 words found in 5.6 - 7.6s (previously 9 - 11s)
??e??r??s --> 130 words found in 5.7 - 7.7s (previously 7.4 - 8.7s)
??e?????s --> 1031 words found in 5.8 - 7.3s (previously 8.6 - 12s)

p?e??r??? --> 93 words forund in 1.3 - 2.0 (previously 2.6s)
sc???r??? --> 9 words found in 1.0 - 1.3 (previously 1.2s - 1.4s)
⋮

One possibility I was going to mention, when reading the following, was that multiple sort orders could be used so that binary search could be employed even when the search pattern started with “?”. (N sort orders for N-letter words would be the obvious one, although letter pairs come next to mind [although only half of those would be needed — N(N-1)/2 instead of N(N-1). If that’s too much precomputation, letter clumps are another approach – this could still help limit the linear traversal.)

(09-13-2023 12:09 PM)komame Wrote: ⋮
Later, I realized that such iteration was also unnecessary because since I have a sorted dictionary, I could implement binary search (reducing the search range). This greatly sped up the program
⋮

But I didn’t write that earlier as it seemed precomputation wasn’t of direct interest, based on the following:

(09-13-2023 12:09 PM)komame Wrote: ⋮
So, this approach would require us to maintain two dictionaries because converting text to binary for a 9-letter dictionary with over 50,000 words would take a very long time with each run (involving a lot of bitwise AND/OR/BITSL operations).
⋮

My first thought with these sorts of things is that there are distinct phases. Phase 1 is setting things up: getting the dictionary, getting it into the program [writing the program…], setting up search data structures, etc. — the performance clock isn’t engaged yet! Phase 2 is benchmarking: running the code against a performance meter. (So, with the example above, the “converting text to binary” isn’t part of “benchmark time”; it would not be done with each run, but be part of “set up”.)

Hmmmm… if the sorting is being done during “benchmark time”, perhaps an alternative sort could still be used, based on the search string. (But perhaps keeping the dictionary sorted [with a conventional sort] is allowed as precomputation / considered to be a fundamental part of what it means to be “a dictionary”.)

It’s nice to see thought-provoking programs! Big Grin

komame · 09-18-2023, 06:48 AM

It's not that simple. Everything you're suggesting would be applicable for the English language dictionary. In that case, even binary encoding (using 5 bits per character, assuming a maximum word length of 12 characters) would be an excellent solution, and there would be no need for any conversions.

However, in this algorithm, I had to consider diacritic characters, and this greatly complicates the whole matter. Notice that one of the main assumptions of this algorithm was to search for similarities while taking into account the Latin equivalents of diacritic characters.
So, when searching for words with the pattern "DE?" in the result, you will find a list like this:
{"DER","DES","DEY","DÈS","DÉC","DÉP","DÉS"}.

As you can see, even though the second character is specified as "E", the algorithm also matched "È" and "É" (and vice versa) in this case. In such a situation, for searching, I need a dictionary without diacritic characters (otherwise, binary search wouldn't work), but when I find a match, I return the result with diacritic characters, so I also need it.

(09-17-2023 09:13 PM)jte Wrote: My first thought with these sorts of things is that there are distinct phases. Phase 1 is setting things up: getting the dictionary, getting it into the program [writing the program…], setting up search data structures, etc. — the performance clock isn’t engaged yet! Phase 2 is benchmarking: running the code against a performance meter. (So, with the example above, the “converting text to binary” isn’t part of “benchmark time”; it would not be done with each run, but be part of “set up”.)

That's why I mentioned that to apply your suggestion with binary encoding, two dictionaries would need to be maintained (even before the program is run). In my current solution, I have only one dictionary, and the second one (without diacritic characters) is generated during program execution using the REPLACE function for strings, which is low-level and very fast (just as many calls as there are different diacritic characters, and the entire dictionary is converted => see 'convAccents'). In the case of binary encoding, we would have to perform this conversion by iterating over individual words and converting letter by letter, additionally performing many bitwise operations, which would significantly impact the algorithm's performance even before reaching binary search.

(09-17-2023 09:13 PM)jte Wrote: One possibility I was going to mention, when reading the following, was that multiple sort orders could be used so that binary search could be employed even when the search pattern started with “?”. (N sort orders for N-letter words would be the obvious one, although letter pairs come next to mind [although only half of those would be needed — N(N-1)/2 instead of N(N-1). If that’s too much precomputation, letter clumps are another approach – this could still help limit the linear traversal.)

How would you like to perform this sorting in PPL? By what method? Regardless of the method you choose (even for letter clumps), this sorting would still take several times longer than the full search algorithm in its current form. We cannot use lists here because they have a limit of 10,000 elements, and we need to sort 50,000. Additionally, after this sorting, diacritic character conversion needs to be performed, which will result in additional iterations.

Currently, my algorithm operates by first converting the dictionary with diacritic characters into a dictionary with their Latin equivalents using REPLACE, which, in most typical queries, takes up to 100ms. In the worst-case scenario for a whole 9-letter dictionary (when the pattern begins with '?'), it involves over 470,000 characters, and such conversion in 13 iterations takes approximately 500ms (at that point, it often accounts for more than half of the total time required to return the result)
Next, binary searches are performed twice because, for the given pattern, it searches for both the initial and final words in the dictionary. This process takes 4-5 ms at worst (basically negligible). Then it starts scanning from the initial word and iterates through subsequent words, checking their compliance with the pattern until it reaches the final word, at which point it finishes its operation. Therefore, the scanning time is directly related to the number of words to scan, but in the worst case, when scanning over 5000 words, it takes about ~850ms (if only first letter is specified). However, it often involves only 100-300 words, which takes less than 50ms. In practice, in most cases, the entire algorithm, from the moment it starts running, performs around 30-50 comparisons during binary search and around 300-800 comparisons during scanning. So, summarizing everything together, these times are not that terrible.

However, if I were to apply any of your suggestions (binary encoding or sorting), and I wanted it to work efficiently, I would have to have this dictionary in the program at least twice (or multiple times) to avoid conversions. If I didn't want to double the dictionary's space usage, I would have to perform the conversion during program execution without using REPLACE, which would require iterating through 50000 9-letter words, letter by letter, and converting them one by one (almost half a million iterations for individual characters in total). Such a solution would probably work tens of times slower compared to the 1000-2000 comparisons I currently perform, taking into account the constant time for dictionary conversion.

Switching to an English dictionary would simplify things a lot Wink

jte · (This post was last modified: 09-18-2023 08:37 AM by jte.)

Piotr,

Thanks for all the back-and-forth dialog. I think we agree on pretty much everything! Big Grin

I just like to cheat a bit and run code before "the real run" begins and the timers come out... (I agree, multiple dictionaries etc. would need to be maintained, and that generating them could easily take more time than doing searches, counting matches etc.) [Although the "additional dictionaries" only need be indexing structures --- indices into the actual dictionary could be used, to save space at the cost of algorithmic complexity; even so, multiple indexing structures would rapidly gobble up space.]

Again, to repeat, I'm thankful for the dialog: one possible result of it would be to change REPLACE so that lists (in the strings case) could be given as the second and third arguments, to enact multiple replacements with a single PPL call. (e.g., allow "replace(str,accents,latin);" to be used in CROSSWORDS' convAccents(-) instead of the looped call to REPLACE that is currently being done.)

Jeff

Tyann · 09-18-2023, 06:51 PM

Bonsoir Komane,
J'ai enfin trouvé un peu de temps pour regarder vôtre code.
Je me suis penché uniquement sur la conversion pour l'instant.
'REPLACE' , je n' y ai pas pensé une seconde Sad

(
J'ai retrouvé quelques similitudes avec le mien, notamment mettre les lettres accentuées dans une liste et leur équivalent dans une autre.
J'ai ensuite rendu ma fonction plus rapide en utilisant une chaîne à la place et en utilisant la notation str(i) qui permet d'extraire un caractère d'une chaîne mais aussi d'en remplacer un.
Je me suis demandé si je pouvais adapter cela à vôtre code, l'idée était de faire :

Code:

accent:="àéèç......";

latin :="aec...."

for i from 1 to dim(str)

 str:=replace(str,accent(i),latin(i);

end;

Malheureusement 'replace' ne semble pas accepter cette notation.

De mon côté mon idée d'optimisation n'a rien donné de flagrant, par contre ma nouvelle organisation m' a permis d'optimiser la recherche quand la première lettre est connue ,la recherche se fait en <2 s.

Bonsoir Jte

Une fonction REPLACE telle que vous la décrivez serait effectivement la bienvenue.

Good evening Komane,
I've finally found some time to look at your code.
I've only looked at the conversion for now.
I didn't think of 'REPLACE' for a second Sad

(
I found a few similarities with mine, notably putting accented letters in one list and their equivalent in another.
I then made my function faster by using a string instead and by using the str(i) notation which allows you to extract a character from a string but also to replace one.
I wondered if I could adapt this to your code, and the idea was to do :

Code:

accent:="àéèç......";

latin :="aec...."

for i from 1 to dim(str)

 str:=replace(str,accent(i),latin(i);

end;

Unfortunately, 'replace' doesn't seem to accept this notation.

On my side, my idea of optimization didn't give anything obvious, but my new organization allowed me to optimize the search when the first letter is known, the search is done in <2 s.

Good evening Jte

A REPLACE function such as you describe would indeed be welcome.

komame · (This post was last modified: 09-19-2023 03:50 AM by komame.)

Hello Tyann,

(09-18-2023 06:51 PM)Tyann Wrote: Good evening Komane,
I've finally found some time to look at your code.
I've only looked at the conversion for now.
I didn't think of 'REPLACE' for a second (
I found a few similarities with mine, notably putting accented letters in one list and their equivalent in another.
I then made my function faster by using a string instead and by using the str(i) notation which allows you to extract a character from a string but also to replace one.
I wondered if I could adapt this to your code, and the idea was to do :

Code:

accent:="àéèç......"; latin :="aec...." for i from 1 to dim(str) str:=replace(str,accent(i),latin(i); end;
Unfortunately, 'replace' doesn't seem to accept this notation.

Try this:

Code:

str:=replace(str,CHAR(accent(d)),CHAR(latin(d)));

But you need to iterate from 1 to the number of accent characters you have to convert, not the length of the text you're converting. So, instead of dim(str) in your REPLACE usage example, it should be dim(accent) or dim(latin) instead.

(09-18-2023 06:51 PM)Tyann Wrote: On my side, my idea of optimization didn't give anything obvious, but my new organization allowed me to optimize the search when the first letter is known, the search is done in <2 s.

Did you get a result in less than 2 seconds for the 9-letter word with the pattern "A????????" or was it a different first letter? How many words did you get in the result?

komame · 09-18-2023, 07:37 PM

Jeff,

(09-18-2023 08:27 AM)jte Wrote: I just like to cheat a bit and run code before "the real run" begins and the timers come out...

I understand you perfectly, I often feel the same way.

Quote:one possible result of it would be to change REPLACE so that lists (in the strings case) could be given as the second and third arguments, to enact multiple replacements with a single PPL call. (e.g., allow "replace(str,accents,latin);" to be used in CROSSWORDS' convAccents(-) instead of the looped call to REPLACE that is currently being done.)

I think using a list as the second and third parameter here won't make much difference. Notice that you have to perform the replacement sequentially; you can't replace all found values at once in a single iteration over the text string because they may overlap. It would only make sense for 1-character elements, but I guess you won't impose a limitation that only 1-character elements can be provided in the list, right? The iteration itself in 'accentConv(-)', excluding REPLACE, only takes 1ms, and that's the only time you can save. To illustrate this: imagine someone uses REPLACE like this: replace("AABBABAB",{"AB","BA"},{"BA","AB"}). How would that work? The only reasonable implementation is to perform it sequentially, but it won't be any faster than the current PPL version with iterations.

Piotr

Tyann · 09-19-2023, 05:19 AM

Bonjour Komane

dim(accent) oui bien sûr : erreur d'inattention.

Pour REPLACE, je verrai bien un truc du genre REPLACE(str1,{"é","è","ê"},"e");

Pour a???????? je trouve 3936 mots <6s (mais j'ai eu un 9s75).

Je suis maintenant aux mots de 14L soit un total de 211 574 mots et les performances semblent s'en ressentir et les temps varient beaucoup pour une même recherche et par exemple pour p?e??r??? je suis maintenant à 2.03 sec.

j'ai également programmé la recherche d'anagrammes et cela semble rapide.
exemple j'entre 'ratessiel' :
1 mots de 9L <7s --> 'lesterais'
9 mots de 8L <5s --> 'relaisse', 'reliasse', 'resalies', 'ateliers' ......
46 mots de 7L<3.5s
etc...

Hello Komane

dim(accent) yes of course: careless error.

For REPLACE, I could see something like REPLACE(str1,{"é", "è", "ê"}, "e");

For a???????? I find 3936 words <6s (but I got a 9s75).

I'm now down to 14L words, i.e. a total of 211,574 words, and performance seems to be suffering as a result and times vary a lot for the same search, for example for p?e??r??? I'm now at 2.03 sec.

I have also programmed the search for anagrams and this seems to be fast.
For example, I enter 'ratessiel' :
1 word of 9L <7s --> 'lesterais
9 words of 8L <5s --> 'relaisse', 'reliasse', 'resalies', 'ateliers' ......
46 words of 7L<3.5s
etc...

komame · 09-19-2023, 09:56 AM

(09-19-2023 05:19 AM)Tyann Wrote: For REPLACE, I could see something like REPLACE(str1,{"é", "è", "ê"}, "e");

Yes, indeed, such a solution makes sense because this approach allows for substitution in a single iteration, assuming that the elements to be replaced follow the order in the provided list.
This would work much faster than the current multiple replace operations.

(09-19-2023 05:19 AM)Tyann Wrote: FFor a???????? I find 3936 words <6s (but I got a 9s75).

It seems to me that it should be 3893 words (the same as for A on the website you provided).

(09-19-2023 05:19 AM)Tyann Wrote: I'm now down to 14L words, i.e. a total of 211,574 words, and performance seems to be suffering as a result and times vary a lot for the same search, for example for p?e??r??? I'm now at 2.03 sec.

I will check it.

Tyann · (This post was last modified: 09-19-2023 10:18 AM by Tyann.)

Quote:It seems to me that it should be 3893 words (the same as for A on the website you provided).

Il m'arrive parfois de rajouter des mots au dictionnaire lorsque il y a en un qui n'apparaît pas :

par exemple ici : entre 'abat-voix' et 'abattages' j'ai dans ma liste 'abattable' qui n'est pas dans la liste du site.

I sometimes add words to the dictionary when there is one that doesn't appear:

for example here: between 'abat-voix' and 'abattages' I have in my list 'abattable' which is not in the list on the site.[quote]

komame · 09-19-2023, 06:45 PM

Hello,

Version 1.00 rc1 released!
https://www.hpmuseum.org/forum/thread-20...#pid177326

Dictionaries have been added up to 25 letters!!!
I managed to make some minor performance optimizations, but the gains are negligible. However, despite adding a significant amount of data, I haven't noticed any slowdown.

Best regards.

Tyann · 09-19-2023, 07:29 PM

Bonsoir
J'ai continuer à regarder vôtre code ce soir, la partie où vous connaissez la première lettre
et j'ai donc compris que si le mot commence par C???.. vous chercher l'index du mot >BZZZ..
et l'index de celui <DAAA..., donc avec mon organisation de 26 sous listes je fais la même chose en prenant le code ascii de la première lettre -96 = n° de sous liste à explorer et elle seule.
J'ai vu qu'ensuite vous faites une itération lettre par lettre comme je fais.
J'ai compris en voyant vôtre code que je faisais une erreur en appelant une fonction extérieure pour la comparaison et donc un appel avec passage de paramètres dans la boucle pour chaque mot de la liste, et j'ai donc incorporé cette boucle de comparaison dans la boucle de parcours des mots.
Pourtant mes recherches prennent plus de temps que les vôtres, j'en conclus donc que l'accès aux chaînes est plus rapide que celui aux listes, comme vous l'avez annoncé.

Good evening
I continued to look at your code this evening, the part where you know the first letter
and I realised that if the word starts with C???... you look for the index of the word >BZZZ...
and the index of the word <DAAA..., so with my organisation of 26 sub-lists I do the same thing, taking the ascii code of the first letter -96 = number of the sub-list to be explored and that alone.
I saw that you then iterate letter by letter as I do.
I realised when I saw your code that I was making a mistake by calling an external function for the comparison and therefore a call with passing parameters in the loop for each word in the list, and I therefore incorporated this comparison loop in the word browsing loop.
However, my searches take longer than yours, so I conclude that accessing strings is faster than accessing lists, as you said.

komame · 09-20-2023, 04:20 AM

(09-19-2023 07:29 PM)Tyann Wrote: I continued to look at your code this evening, the part where you know the first letter
and I realised that if the word starts with C???... you look for the index of the word >BZZZ...
and the index of the word <DAAA..., so with my organisation of 26 sub-lists I do the same thing, taking the ascii code of the first letter -96 = number of the sub-list to be explored and that alone.

This changed in the latest version (rc1), which I released yesterday. Due to a minor optimization, I now simply decrease the last character by 1, leaving the '?' characters unchanged. This is enough to find the first word in the range because '?' is smaller than 'A', so the first letter can remain as entered, and there is no need to go back to an earlier letter in the alphabet. For the last word in the range, I change all '?' characters to 'Z' and additionally increment the last character by 1. This only shortened the search range in a few cases, but it's always some extra optimization. However, in the case of "sc???r???", it was possible to reduce the search time to 0.6 - 0.8s (previously 1.0 - 1.3s).

(09-19-2023 07:29 PM)Tyann Wrote: However, my searches take longer than yours, so I conclude that accessing strings is faster than accessing lists, as you said.

Yes, in the current firmware version (14730), lists in PPL do indeed work slower than text strings, although in the past, in much older firmware versions, it was the opposite, and lists worked faster. You never know if this will change again with the next firmware version.

Tyann · 09-20-2023, 04:55 AM

Bonjour
En regardant vôtre code, il m'est venu une idée d'optimisation, en effet ma boucle de comparaison des lettres commençais toujours à 1 et j'ai vu que vous commencez à 2 lorsque la première lettre est connue.
En fait on peut pousser le raisonnement plus loin :
Pour P???e?? la boucle peut commencer à 5, pour ??e??r??? elle commencera à 3 et pour sc???es
elle devra commencer à 2.
J'ai fait cela ce matin, et par exemple pour la recherche p????eurs le temps est /2.
Autre chose que vous pouvez faire dans la boucle de convaccent mettre un test :

Code:

IF MAX(ASC"str"))<223 THEN BREAK; END

Autrement dit si il n'y a pas ou plus d'accent quitter la boucle des 'replace'.

Hello
Looking at your code, I got an idea for optimisation. My letter comparison loop always started at 1 and I saw that you start at 2 when the first letter is known.
In fact, you can take the reasoning a step further:
For P???e?? the loop can start at 5, for ??e??r??? it will start at 3 and for sc???es
it must start at 2.
I did this this morning, and for example for the search p????eurs the time is /2.
Another thing you can do in the convaccent loop is to put in a test:

Code:

IF MAX(ASC "str"))<223 THEN BREAK; END

In other words, if there are no or no more accents, leave the 'replace' loop.

komame · 09-20-2023, 05:50 AM

Hi Tyann,

(09-20-2023 04:55 AM)Tyann Wrote: Looking at your code, I got an idea for optimisation. My letter comparison loop always started at 1 and I saw that you start at 2 when the first letter is known.
In fact, you can take the reasoning a step further:
For P???e?? the loop can start at 5, for ??e??r??? it will start at 3 and for sc???es
it must start at 2.

I already have such optimization for patterns that start with '?'.
This loop handles it:

Code:

  for n from 2 to length do

   if wts[n]=63 then

    continue;

   else

    break;

   end;

  end;

After its execution, the variable 'n' contains the position from which each word will be scanned.
However, indeed, it might also be applicable to words for which the first letters are known, and then there are '?' characters. I will add this optimization and compare the results. Thanks for this idea.

(09-20-2023 04:55 AM)Tyann Wrote: Another thing you can do in the convaccent loop is to put in a test:

Code:

IF MAX(ASC "str"))<223 THEN BREAK; END
In other words, if there are no or no more accents, leave the 'replace' loop.

I have doubts about this. Of course, I will check it because the idea is interesting, but instead of using MAX, I will use INSTRING, and then I can skip specific iterations for individual characters. Nevertheless, I am concerned that in most cases, it may even introduce additional slowdown because there are very few dictionaries of a specific word length that do not contain some diacritical characters. Even if such situations occur, they are usually only for dictionaries with a small number of words, and in the case of small dictionaries, the conversion currently happens almost instantly. On the other hand, if the dictionary is large, most likely, all diacritical characters are present in it, and applying this approach will only add extra checks, taking additional time without bringing any benefits. However, the modification is easy to make, so I will conduct tests and let you know.

Tyann · 09-20-2023, 11:31 AM

Quote:I have doubts about this. Of course, I will check it because the idea is interesting, but instead of using MAX, I will use INSTRING, and then I can skip specific iterations for individual characters.

Mais oui vous faîtes la conversionaccent sur tous les mots en une seule opération dans une chaîne donc aucun intérêt.
Là est une grosse différence de vitesse avec moi qui doit le faire pour chaque mot.
Je dois réfléchir si je peux convertir ma liste de sous liste en une seule chaîne, faire la conversion puis remettre sous forme de liste ou pas ?

But yes, you do the conversionaccent on all the words in a single operation in a string, so there's no point.
There's a big difference in speed with me, who has to do it for each word.
I need to think about whether I can convert my list of sub-lists into a single string, do the conversionaccent and then convert back into a list or not?

Tyann · 09-20-2023, 12:15 PM

Mais oui ça marche, un STRING et un EXPR suffisent, je convertit ma sous-liste en chaîne, je convertit avec vôtre routine convAccents puis un EXPR et je retrouve ma liste et je teste chaque élément .
??e??r??? prends maintenant <12s.
Et je peux garder mon organisation en sous liste bien pratique pour modifier le dictionnaire.
Encore merci Komane, j'ai beaucoup appris ces derniers jours grâce à vous.
Trop sympathique

But yes, it works, a STRING and an EXPR are enough, I convert my sub-list into a string, I convert with your convAccents routine then an EXPR and I find my list and I test each element.
It now takes <12s.
And I can keep my sub-list organisation, which is very useful for modifying the dictionary.
Thanks again Komane, I've learned a lot in the last few days thanks to you.
Too kind

komame · (This post was last modified: 09-20-2023 07:54 PM by komame.)

Hi Tyann,

(09-20-2023 12:15 PM)Tyann Wrote: But yes, it works, a STRING and an EXPR are enough, I convert my sub-list into a string, I convert with your convAccents routine then an EXPR and I find my list and I test each element.
??e??r??? now takes <12s.
And I can keep my sub-list organisation, which is very useful for modifying the dictionary.

Could you show the code you use to convert sub-lists into a string and vice versa?

(09-20-2023 12:15 PM)Tyann Wrote: Thanks again Komane, I've learned a lot in the last few days thanks to you.

Thank you. I'm very pleased to hear that, and I'm glad that this short collaboration has brought you some benefits

(09-20-2023 11:31 AM)Tyann Wrote: But yes, you do the conversionaccent on all the words in a single operation in a string, so there's no point.

As I predicted, this optimization for dictionary conversion didn't yield any benefits, but the idea of shifting the start of iteration for patterns starting with known letters and then '?' characters yielded very good results:

     a??? => 0.021 - 0.023s (123 words)
    a???? => 0.078 - 0.106s (423 words)
   a????? => 0.255 - 0.291s (1101 words)
  a?????? => 0.670 - 0.841s (2151 words)
a??????? => 1.461 - 1.719s (3167 words)
a???????? => 2.363 - 2.725s (3839 words)
sc???r??? => 0.597 - 0.618s (9 words)
p????eurs => 0.975 - 1.171s (29 words)