Optimize for "session" scenario by pavlo-liapota · Pull Request #8 · dmcg/bumper-generator

pavlo-liapota · 2023-02-12T14:55:50Z

I have started to think how can we optimize for the scenario that you were talking: start a session and explore possible anagrams.

As I understand, we don't need to show all (potentially millions) possible anagrams, so we don't need to generate them right away.

In my first commit I have changed that we don't generate all anagrams. I return only one of them for now, but we need to think what output do we want to show to the user? Maybe all possible words from which anagrams can be built? Or maybe top 10 that are commonly used? (we may need to use some external resource to sort words by how often they are used). Important part is that I still have a tree and all anagrams can be generated from it.
It takes 3100ms to generate all anagrams for REFACTORING TO KOTL, but it takes only 1000ms to generate a resulting tree.

In my second commit I have removed a code that makes sure that duplicated results like A CAT and CAT A are not generated. Basically I have removed analogue of this:

remainingCandidateWords = remainingCandidateWords.subList(
      1, remainingCandidateWords.size
)

It is not an issue to have such duplicates in a resulting tree as we don't generate all results anyway. This makes code simpler and improves performance of resulting tree generation from 1000 to 700 for REFACTORING TO KOTL input.

pavlo-liapota · 2023-02-12T16:20:53Z

Now I can describe how the cache works.

First, let imagine that we allow duplicated anagrams like A CAT and CAT A and that we don't care about anagram maximum depth.
In this case, function process will always return the same results if called with same inputLetters. We don't care about depth and we don't need to filter candidate words based on words that are already used, so nothing else can influence the result. This means that if during our computations we need to call process function with same input several times, we can compute it once, cache and reuse computed result for all subsequent calls.

And in fact it is quite often that process function is called several times with same input.
Imagine input letters "A HOME CAT". Using just HOME letters we can build following anagrams HOME, HEM O, EH MO, EM HO, HM OE. Each time we will have following remaining letters "ACAT". This means that during our computations we will call process function 5 times with input "ACAT".

If we want to have maximum depth as a parameter, then we need to cache per each inputLetters and depth combination (resulting anagrams will of course be different for the same input letters if different maximum depth is allowed).

And if we don't want to generated duplicated anagrams, then additionally we need to have word index as a parameter, so when we use a word then as a continuation we try to use words with same or higher index only. In this case we need a cache per inputLetter, depth and index combinations.

pavlo-liapota · 2023-02-12T16:33:40Z

I can suggest following implementation for your scenario.

User provides input letters. We compute a tree with all results and show all (or top N) possible first words.
Then user selects first word and we show all possible second words.
We repeat until no letters left.

So we just need to compute a tree, then everything can be taken from a tree with a constant time.

In this case we don't need a depth parameter, so I have removed it in my third commit to make code a bit faster and even simpler. But of course we can still keep it if needed.

pavlo-liapota · 2023-02-12T16:42:30Z

Now we can compute a tree for REFACTORING TO KOTL input in 450ms.
And REFACTORING TO KOTLIN in 1600ms.

pavlo-liapota · 2023-02-13T06:41:14Z

In suggested solution we don't reuse the cache during a session, we just use it to compute a tree faster and then we use only the tree.
We don't need to reuse a cache between the sessions too, but if we do (for example, to speed up sequential sessions with similar inputs), we may need to limit its size to avoid out of memory exception.

pavlo-liapota · 2023-02-13T06:41:51Z

In the "session" scenario we don't generate all anagrams and some code like permuteInto is not needed anymore. But we may like to keep that code and related tests.
Just extending existing code with new code may make it hard to optimize both scenarios.
So should we just move new code to a new package and write new tests for it?

pavlo-liapota · 2023-02-13T17:49:59Z

I have reverted my changes and copied new implementation for session scenario into another package.
I have also created speed tests for it.

pavlo-liapota · 2023-02-13T18:41:55Z

I have implemented simple console application that allows to explore possible anagrams.
For example, I was able to find anagram RETROFITTING ON CLOAK for input REFACTORING TO KOTLIN :)

pavlo-liapota added 2 commits February 12, 2023 15:15

Return only first anagram

4ddba7b

Optimize cache by allowing duplicated anagrams in a tree

d666833

pavlo-liapota force-pushed the optimize-for-session branch from 07d2709 to d666833 Compare February 12, 2023 15:05

Remove depth parameter and cache for candidate words

52337e6

pavlo-liapota added 2 commits February 13, 2023 18:19

Revert changes

60ae132

Put new code for a session scenario into another package

4cf2513

Simple console application to explore anagrams

43b7490

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize for "session" scenario#8

Optimize for "session" scenario#8
pavlo-liapota wants to merge 6 commits intodmcg:masterfrom
pavlo-liapota:optimize-for-session

pavlo-liapota commented Feb 12, 2023 •

edited

Loading

Uh oh!

pavlo-liapota commented Feb 12, 2023 •

edited

Loading

Uh oh!

pavlo-liapota commented Feb 12, 2023 •

edited

Loading

Uh oh!

pavlo-liapota commented Feb 12, 2023 •

edited

Loading

Uh oh!

pavlo-liapota commented Feb 13, 2023

Uh oh!

pavlo-liapota commented Feb 13, 2023

Uh oh!

pavlo-liapota commented Feb 13, 2023

Uh oh!

pavlo-liapota commented Feb 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

pavlo-liapota commented Feb 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pavlo-liapota commented Feb 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pavlo-liapota commented Feb 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pavlo-liapota commented Feb 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pavlo-liapota commented Feb 13, 2023

Uh oh!

pavlo-liapota commented Feb 13, 2023

Uh oh!

pavlo-liapota commented Feb 13, 2023

Uh oh!

pavlo-liapota commented Feb 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

pavlo-liapota commented Feb 12, 2023 •

edited

Loading

pavlo-liapota commented Feb 12, 2023 •

edited

Loading

pavlo-liapota commented Feb 12, 2023 •

edited

Loading

pavlo-liapota commented Feb 12, 2023 •

edited

Loading