Skip to content

Record: 11L + Multi-Order N-gram Backoff + Entropy-Adaptive Alpha (val_bpb=0.6672)#770

Open
minh-stakc wants to merge 2 commits intoopenai:mainfrom
minh-stakc:submission/ngram-cache-0.6672
Open

Record: 11L + Multi-Order N-gram Backoff + Entropy-Adaptive Alpha (val_bpb=0.6672)#770
minh-stakc wants to merge 2 commits intoopenai:mainfrom
minh-stakc:submission/ngram-cache-0.6672

Conversation

@minh-stakc
Copy link
Copy Markdown

Summary

val_bpb: 0.6672 (seed 42) | 15.0 MB artifact | 1xB200 (HiPerGator)

Technique

Base 11L SOTA architecture with eval-time multi-order n-gram cache interpolation.

Key innovations

  1. Multi-order backoff (orders 2-7): Highest order first, cascade down on miss. Captures repeated document patterns outside the transformer's context window.

  2. Entropy-adaptive alpha: alpha = 0.05 + 0.55 * sigmoid(2 * (H - 4.0)). When the model is uncertain (high entropy), trust n-gram statistics more; when confident, trust the LM.

Compliance

  • Score-first, backward-looking: n-gram counts built from previously scored tokens only
  • No oracle selection: alpha depends on model entropy, never on ground-truth labels
  • Single blended prediction per token, no min(NLL)

Results

Metric Value
Pre-quant val_bpb 1.1927
Post-quant roundtrip 1.1577
Post n-gram sliding (s64) 0.6672
Artifact size 15,025,238 bytes

Architecture

11L, 512d, 8H/4KV GQA, MLP 3x, XSA4, Partial RoPE, LN Scale, VE128, SmearGate, BigramHash(2048), EMA(0.997), Late QAT, OrthoInit. Int6+GPTQ-lite+3% pruning+zstd-22.

Reproduction

SEED=42 NGRAM_CACHE=1 NGRAM_ORDER=7 NGRAM_MIN_ORDER=2 \
NGRAM_ENTROPY=1 EVAL_STRIDE=64 PRUNE_PCT=0.03 \
torchrun --standalone --nproc_per_node=1 train_gpt.py

Credits

PR #414 (signalrush), PR #315 (jfprincz), PR #702/#727 (lukacf)

Test plan

  • Artifact under 16MB (15.0 MB)
  • Score-first n-gram cache (backward-looking)
  • No min(NLL), no target-aware gating
  • Single seed included (additional seeds pending)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant