fix(formatron): handle duplicate BPE token IDs and kbnf mask/accept inconsistency by lesj0610 · Pull Request #170 · turboderp-org/exllamav3

lesj0610 · 2026-03-16T05:52:41Z

Problem

Two related bugs affect constrained generation (json_schema, regex_pattern, grammar_string) with GPT-2 byte-level BPE tokenizers (e.g. EXAONE 4.x, GPT-J, GPT-NeoX, BLOOM):

Bug 1 — Duplicate token IDs dropped from vocabulary

get_vocab_dict() builds a {str: int} dict comprehension; when multiple token IDs share the same piece string, only the last ID survives. Earlier IDs are silently absent from new_vocab. When the model samples one of these dropped IDs, kbnf raises:

ValueError: The input token id is rejected and the EngineLike's internal states are not updated.

This crashes the generation loop.

Bug 2 — kbnf mask/accept inconsistency (kbnf ≤ 0.4.2)

compute_allowed_tokens() can report a token as valid while try_accept_new_token() subsequently rejects the same token with ValueError. In kbnf ≤ 0.4.2 this is triggered by GPT-2 special tokens (e.g. [PAD], id=0, bytes=b'[PAD]') whose literal ASCII bytes incidentally match valid grammar positions (e.g. inside a JSON string value).

The rejection propagates uncaught, crashing the calling process.

Fixes

1. `create_engine_vocabulary()` — back-fill duplicate token IDs

Build a str → bytes lookup from new_vocab, then iterate the full vocabulary and assign bytes for any ID that was dropped due to string deduplication.

2. `accept_token()` — defensive `try/except` around kbnf call

Wrap self._formatter.accept_token(token) in try/except ValueError and re-raise. This turns a process crash into a clean per-request abort that callers can handle gracefully.

This guard remains safe and beneficial even after a kbnf-side fix: it converts any future unforeseen mask/accept inconsistency into a recoverable error rather than a crash.

kbnf version note

kbnf ≥ 0.5.7 resolves the underlying mask/accept inconsistency in the Rust engine. The PyPI package is currently at 0.4.2; a pre-built wheel for Linux x86_64 is available at:

https://github.com/lesj0610/kbnf/releases/tag/v0.5.7

The setup.py comment points to this wheel until the upstream author publishes 0.5.7 to PyPI.

Affected models

Any model using a GPT-2 byte-level BPE tokenizer where special tokens are stored as literal ASCII strings — approximately 10–20% of open-source LLMs (EXAONE, GPT-J, GPT-NeoX, BLOOM, some Falcon variants).

Testing

Verified with EXAONE-4.0.1-32B + json_schema constrained generation: 10/10 requests succeed with kbnf 0.5.7; previously crashed on ~1 in 5 requests with kbnf 0.4.2.

…nconsistency Two related fixes for GPT-2 byte-level BPE tokenizers (e.g. EXAONE 4.x): 1. create_engine_vocabulary(): back-fill duplicate token IDs get_vocab_dict() builds a {str: int} dict where the last ID wins for duplicate token strings, silently dropping earlier IDs from new_vocab. When the model samples one of the dropped IDs, kbnf raises "The input token id is rejected", crashing generation. Fix: rebuild a str->bytes lookup from new_vocab and back-fill any token ID absent from new_vocab by reusing the bytes of its duplicate string. 2. accept_token(): guard against kbnf mask/accept inconsistency kbnf <=0.4.2 can report a token as allowed via compute_allowed_tokens() but then reject the same token in try_accept_new_token(), raising ValueError and crashing the process. Wrapping the call in try/except and re-raising turns the crash into a clean per-request abort. This defensive pattern remains safe after a kbnf-side fix. Note: kbnf >=0.5.7 resolves the underlying inconsistency. A pre-built wheel for Linux x86_64 is available at: https://github.com/lesj0610/kbnf/releases/tag/v0.5.7 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(formatron): handle duplicate BPE token IDs and kbnf mask/accept inconsistency#170

fix(formatron): handle duplicate BPE token IDs and kbnf mask/accept inconsistency#170
lesj0610 wants to merge 1 commit intoturboderp-org:masterfrom
lesj0610:fix/formatron-kbnf-compat

lesj0610 commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

lesj0610 commented Mar 16, 2026

Problem

Bug 1 — Duplicate token IDs dropped from vocabulary

Bug 2 — kbnf mask/accept inconsistency (kbnf ≤ 0.4.2)

Fixes

1. create_engine_vocabulary() — back-fill duplicate token IDs

2. accept_token() — defensive try/except around kbnf call

kbnf version note

Affected models

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. `create_engine_vocabulary()` — back-fill duplicate token IDs

2. `accept_token()` — defensive `try/except` around kbnf call