Skip to content

Comments

Add a benchmark targeting NFA to DFA tradeoffs.#492

Open
sayrer wants to merge 3 commits intotimbray:mainfrom
sayrer:nfa_dfa_benchmarks
Open

Add a benchmark targeting NFA to DFA tradeoffs.#492
sayrer wants to merge 3 commits intotimbray:mainfrom
sayrer:nfa_dfa_benchmarks

Conversation

@sayrer
Copy link
Contributor

@sayrer sayrer commented Feb 10, 2026

Here's the tradeoff this file attempts to measure (see #481):

Small state space, eager fits in budget → eager wins, it's faster, no cache overhead.

Large state space, predictable input → lazy wins by a mile. Eager can't even attempt it.

Large state space, adversarial input → lazy falls back to NFA-with-overhead. Eager also falls back to NFA because it blew the budget. You're in the same place, maybe ~2x slower from your varied input benchmarks — but that's 2x slower than a path that was already the fallback.

This file contains 5 shellstyle wildcard benchmarks designed to characterize NFA vs DFA tradeoffs:

  1. BenchmarkShellstyleSimpleWildcard (line 16) — Simple prefixsuffix patterns like "ab" where an eager DFA would be ~3 states. Tests whether simple wildcards deserve DFA treatment.
  2. BenchmarkShellstyleNarrowInput (line 90) — Wide Unicode patterns (anchors from ASCII, CJK, mixed scripts) but narrow input alphabets (digits, lowercase, etc.). Shows a demand-driven DFA only needs states for bytes actually seen.
  3. BenchmarkShellstyleWidePatternsScaling (line 219) — Scales from 8 to 512 patterns with multi-script anchors but ASCII-digit-only input. Isolates how a lazy DFA cache stays small regardless of Unicode coverage.
  4. BenchmarkShellstyleSimpleWildcardScaling (line 300) — Scales from 1 to 26 independent simple patterns to show even modest collections benefit from DFA conversion.
  5. BenchmarkShellstyleZWJEmoji (line 373) — Worst case: ZWJ emoji sequences (15-25+ bytes per glyph) mixed with Japanese text. Shared leading bytes (0xE2, 0xE3, 0xE4) force massive NFA branching.
  SimpleWildcard — 325-422 ns/op, 1 alloc. Very fast for simple prefix*suffix patterns.
                                                                                                                                                                                                                                                                         
  SimpleWildcardScaling — ~370 ns/op regardless of pattern count (1-26). Scaling is flat, which is good.                                                                                                                                                                 
                                                                                                                                                                                                                                                                         
  NarrowInput — Scales roughly linearly with pattern count. Notable: multi-byte input (narrow CJK) is ~3-4x slower than ASCII digits, showing the cost of UTF-8 byte-level branching in the NFA.                                                                         
                                                                                                                                                                                                                                                                         
  WidePatternsScaling — Shows superlinear scaling: 987 ns (8 patterns) → 148 μs (512 patterns). The jump from 256→512 patterns (31→149 μs, ~4.8x) suggests the NFA traversal cost is growing faster than linearly.                                                       

  ZWJEmoji — 5.7-40 μs with 14-15 allocs/op. The high allocation count (vs 1 alloc for simpler benchmarks) and the per-op cost confirm that ZWJ sequences with shared leading bytes are expensive for NFA traversal.

@timbray
Copy link
Owner

timbray commented Feb 12, 2026

Cool benchmarks, thanks, will probably adopt. But I'm missing something, the benchmarks don't call nfa2Dfa so how do you arrive at the conclusions up at the top of this thread?

@sayrer
Copy link
Contributor Author

sayrer commented Feb 12, 2026

Cool benchmarks, thanks, will probably adopt. But I'm missing something, the benchmarks don't call nfa2Dfa so how do you arrive at the conclusions up at the top of this thread?

You can only do damage to these with a lazy or eager (nfa2dfa) DFA implementation. These set the baseline with always-NFA in the presence of wildcards. So, if you look at the patches here (picking and choosing), you'll see it: main...sayrer:quamina:lazy_dfa

@timbray
Copy link
Owner

timbray commented Feb 12, 2026

Got it. Need to finish first-cut nfa2dfa.

@sayrer
Copy link
Contributor Author

sayrer commented Feb 12, 2026

Got it. Need to finish first-cut nfa2dfa.

Shouldn't we just check in the benchmark now? Then hammer on it and declare victory? I've shown that's possible, but maybe not in a way you're cool with.

@timbray
Copy link
Owner

timbray commented Feb 12, 2026

Got it. Need to finish first-cut nfa2dfa.

Shouldn't we just check in the benchmark now? Then hammer on it and declare victory? I've shown that's possible, but maybe not in a way you're cool with.

Probably. Will take a closer look in the near future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants