Add a benchmark targeting NFA to DFA tradeoffs. by sayrer · Pull Request #492 · timbray/quamina

sayrer · 2026-02-10T17:23:45Z

Here's the tradeoff this file attempts to measure (see #481):

Small state space, eager fits in budget → eager wins, it's faster, no cache overhead.

Large state space, predictable input → lazy wins by a mile. Eager can't even attempt it.

Large state space, adversarial input → lazy falls back to NFA-with-overhead. Eager also falls back to NFA because it blew the budget. You're in the same place, maybe ~2x slower from your varied input benchmarks — but that's 2x slower than a path that was already the fallback.

This file contains 5 shellstyle wildcard benchmarks designed to characterize NFA vs DFA tradeoffs:

BenchmarkShellstyleSimpleWildcard (line 16) — Simple prefixsuffix patterns like "ab" where an eager DFA would be ~3 states. Tests whether simple wildcards deserve DFA treatment.
BenchmarkShellstyleNarrowInput (line 90) — Wide Unicode patterns (anchors from ASCII, CJK, mixed scripts) but narrow input alphabets (digits, lowercase, etc.). Shows a demand-driven DFA only needs states for bytes actually seen.
BenchmarkShellstyleWidePatternsScaling (line 219) — Scales from 8 to 512 patterns with multi-script anchors but ASCII-digit-only input. Isolates how a lazy DFA cache stays small regardless of Unicode coverage.
BenchmarkShellstyleSimpleWildcardScaling (line 300) — Scales from 1 to 26 independent simple patterns to show even modest collections benefit from DFA conversion.
BenchmarkShellstyleZWJEmoji (line 373) — Worst case: ZWJ emoji sequences (15-25+ bytes per glyph) mixed with Japanese text. Shared leading bytes (0xE2, 0xE3, 0xE4) force massive NFA branching.

  SimpleWildcard — 325-422 ns/op, 1 alloc. Very fast for simple prefix*suffix patterns.
                                                                                                                                                                                                                                                                         
  SimpleWildcardScaling — ~370 ns/op regardless of pattern count (1-26). Scaling is flat, which is good.                                                                                                                                                                 
                                                                                                                                                                                                                                                                         
  NarrowInput — Scales roughly linearly with pattern count. Notable: multi-byte input (narrow CJK) is ~3-4x slower than ASCII digits, showing the cost of UTF-8 byte-level branching in the NFA.                                                                         
                                                                                                                                                                                                                                                                         
  WidePatternsScaling — Shows superlinear scaling: 987 ns (8 patterns) → 148 μs (512 patterns). The jump from 256→512 patterns (31→149 μs, ~4.8x) suggests the NFA traversal cost is growing faster than linearly.                                                       

  ZWJEmoji — 5.7-40 μs with 14-15 allocs/op. The high allocation count (vs 1 alloc for simpler benchmarks) and the per-op cost confirm that ZWJ sequences with shared leading bytes are expensive for NFA traversal.

timbray · 2026-02-12T20:04:01Z

Cool benchmarks, thanks, will probably adopt. But I'm missing something, the benchmarks don't call nfa2Dfa so how do you arrive at the conclusions up at the top of this thread?

sayrer · 2026-02-12T20:27:43Z

Cool benchmarks, thanks, will probably adopt. But I'm missing something, the benchmarks don't call nfa2Dfa so how do you arrive at the conclusions up at the top of this thread?

You can only do damage to these with a lazy or eager (nfa2dfa) DFA implementation. These set the baseline with always-NFA in the presence of wildcards. So, if you look at the patches here (picking and choosing), you'll see it: main...sayrer:quamina:lazy_dfa

timbray · 2026-02-12T20:52:24Z

Got it. Need to finish first-cut nfa2dfa.

sayrer · 2026-02-12T21:46:16Z

Got it. Need to finish first-cut nfa2dfa.

Shouldn't we just check in the benchmark now? Then hammer on it and declare victory? I've shown that's possible, but maybe not in a way you're cool with.

timbray · 2026-02-12T22:26:50Z

Got it. Need to finish first-cut nfa2dfa.

Shouldn't we just check in the benchmark now? Then hammer on it and declare victory? I've shown that's possible, but maybe not in a way you're cool with.

Probably. Will take a closer look in the near future.

sayrer added 3 commits February 10, 2026 09:15

Add a benchmark targeting NFA DFA tradeoffs.

6057b94

Fix comment.

5d0cde1

Fix comment.

9f147a3

sayrer mentioned this pull request Feb 18, 2026

kaizen: fix duplicate states during closure traversal #494

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add a benchmark targeting NFA to DFA tradeoffs.#492

Add a benchmark targeting NFA to DFA tradeoffs.#492
sayrer wants to merge 3 commits intotimbray:mainfrom
sayrer:nfa_dfa_benchmarks

sayrer commented Feb 10, 2026 •

edited

Loading

Uh oh!

timbray commented Feb 12, 2026

Uh oh!

sayrer commented Feb 12, 2026 •

edited

Loading

Uh oh!

timbray commented Feb 12, 2026

Uh oh!

sayrer commented Feb 12, 2026 •

edited

Loading

Uh oh!

timbray commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

sayrer commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timbray commented Feb 12, 2026

Uh oh!

sayrer commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timbray commented Feb 12, 2026

Uh oh!

sayrer commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timbray commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sayrer commented Feb 10, 2026 •

edited

Loading

sayrer commented Feb 12, 2026 •

edited

Loading

sayrer commented Feb 12, 2026 •

edited

Loading