Merged
Conversation
Replace the dedup map with a generation counter (reusing faState.closureSetGen), and add reusable scratch buffers for the sorted-uniques slice and key bytes. On cache hits the compiler's string(bytes) map-lookup optimization avoids the key allocation entirely. Only cache misses allocate (key string + stored slice). Reduces nfa2Dfa allocations by ~35-40% and wall time by ~12%. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of calling addByteStep (unpack, set one byte, repack) for each of up to 256 byte values, unpack the DFA table once, set all transitions into the unpacked table, then pack once at the end. Also adds BenchmarkNfa2Dfa to measure the nfa2Dfa conversion cost across patterns with varying wildcard counts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the byte-by-byte append loop (8 appends per state, each with bounds checks) with a pre-sized buffer and a single binary.LittleEndian.PutUint64 per state. ~20% faster in nfa2Dfa. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace separate lists and dfaStates maps with a single map of internEntry structs, eliminating the second map lookup on cache hits. ~9-18% faster in nfa2Dfa. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Hoist the rawStates slice above the 256-iteration byte loop and reset with [:0] each iteration instead of allocating a new slice. Eliminates ~95% of nfa2Dfa allocations, ~35-48% faster. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cover the same patterns used in BenchmarkNfa2Dfa to verify correctness of the optimized intern() and n2dNode paths with larger epsilon closures and heavier dedup. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Owner
|
Oh well, since my fingers are hovering over the keyboard on memory_cost and it involves changes to nfa2dfa, I better do this first. |
Contributor
Author
|
There's another tougher one lurking. I wondered why the |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
These are the numbers for BenchmarkNfa2Dfa. This took a little bit of iteration, but it was fun. About 30-40 minutes, but the benchmarks were running at -count=6, so lots of waiting. All of the optimization ideas were good, which is frequently not the case.
I have to go watch the Lakers play the Warriors, so I might not respond to comments until tomorrow or Monday.