Skip to content

Podracing III: Cubric Lite — 0.9362 BPB#782

Open
newjordan wants to merge 1 commit intoopenai:mainfrom
newjordan:submission/podracing-iii
Open

Podracing III: Cubric Lite — 0.9362 BPB#782
newjordan wants to merge 1 commit intoopenai:mainfrom
newjordan:submission/podracing-iii

Conversation

@newjordan
Copy link

@newjordan newjordan commented Mar 25, 2026

Summary

  • 3-seed mean val_bpb = 0.9362 (seeds 2045=0.9357, 43=0.9362, 300=0.9365)
  • 11L/512d U-Net with legal score-first 7-gram backoff (orders 2-7) + entropy-adaptive alpha + per-order adaptive alpha scaling (Cubric Lite)
  • 0.026 BPB improvement over Podracing II (Podracing II: Electric Bugaloo — 0.9625 BPB (3-seed mean, all sub-0.964) #753, 0.9625 mean)
  • Artifact: 15.59 MB (int6+zstd), under 16 MB budget
  • Original contribution: per-order adaptive alpha scaling

What Changed vs Podracing II (#753)

One eval-time addition, no training changes:

Per-order adaptive alpha scaling ("Cubric Lite"): During n-gram eval, track how often each order's n-gram probability beats the model's probability on already-scored tokens. Every 32 batches, adjust per-order alpha multipliers. Converged multipliers: **UNDEREXPLORED

o2:0.300  o3:0.300  o4:0.970  o5:2.000  o6:2.000  o7:2.000

Key finding: bigrams and trigrams (orders 2-3) were actively harming BPB by injecting noisy predictions at the same alpha as high-order matches. Suppressing them to 30% of base alpha and boosting orders 5-7 to 200% = 0.026 BPB gain.

Compliance

  • Score-first, backward-looking: n-gram cache built from already-scored tokens only
  • Alpha depends solely on model's own softmax entropy — no target/label access
  • Per-order multipliers use beat-rate statistics from already-scored tokens — same legality as the score-first table update
  • No oracle selection, no min-NLL comparison
  • GPTQ calibration runs inside training phase (before wallclock stop) using training data only
  • Cubric adaptation runs during eval using only already-scored token statistics

Credits

Test plan

  • 3-seed verification (2045, 43, 300)
  • All seeds under 16 MB
  • GPTQ uses training data only
  • N-gram eval is score-first
  • Cubric uses only already-scored data
  • Training logs included for all seeds

🤖 Generated with Claude Code

Per-order adaptive alpha scaling on legal score-first 7-gram backoff.
Tracks per-order beat rate on already-scored tokens, suppresses noisy
low orders (2-3 → 0.3x alpha), boosts accurate high orders (5-7 → 2.0x).

Results (seeds 2045/43/300):
  Sliding BPB (no n-gram): 1.1198 mean
  Cubric n-gram BPB: 0.9362 mean (0.9357/0.9362/0.9365)
  Artifact: 15.59 MB (int6+zstd)

0.026 BPB improvement over Podracing II (openai#753, 0.9625).
Original contribution: per-order adaptive alpha scaling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant