Skip to content

Podracing: 1.0461 BPB (3-seed mean)#674

Closed
newjordan wants to merge 1 commit intoopenai:mainfrom
newjordan:submission/podracing
Closed

Podracing: 1.0461 BPB (3-seed mean)#674
newjordan wants to merge 1 commit intoopenai:mainfrom
newjordan:submission/podracing

Conversation

@newjordan
Copy link

@newjordan newjordan commented Mar 25, 2026

podracing

Results

Seed Sliding BPB 5-gram BPB Artifact
1337 1.1190 1.0451 15.63 MB
42 1.1217 1.0471 15.59 MB
2045 1.1200 1.0460 15.64 MB
Mean 1.1202 1.0461

Progression

PR Mean BPB Notes
#190 The Stinky Frost Recipe
#390, #401 1.1295, 1.1243 Sponge Bath TTT + EMA/SWA/QAT
#445 1.1236 Late Training Replay + EMA + GPTQ-lite
#498, #499 1.1478 The Frugendorff (recursive weight sharing)
#508, #578 1.1215 GPTQ + Early QAT + Legal TTT
#533, #577 1.1207 GPTQ + Short TTT
#587 1.1208 XSA + quantization tuning
#656 1.1195 Three Breadsticks (activation + eval)
This PR 1.0461 Podracing (5-gram eval interpolation)

Key Addition

Legal score-first hashed 5-gram interpolation during sliding window eval. Fixed-weight linear mixing (alpha=0.20), no target-aware gating. Cache built from already-scored tokens only. Strictly backward-looking.

Inspired by and credited to @deanbrr (PR #659) for the n-gram eval cache concept.

Architecture

11L/512d U-Net, 26.93M params. LeakyReLU² (slope 0.5), XSA last 4, BigramHash 1536. GPTQ int6+zstd, late QAT.

Reproduce

SEED=2045 MLP_ACT=leaky_relu_sq MLP_LEAKY_SLOPE=0.5 XSA_LAST_N=4 BIGRAM_VOCAB_SIZE=1536 ROPE_DIMS=24 NGRAM_EVAL_ORDER=5 NGRAM_EVAL_ALPHA=0.20 NGRAM_EVAL_MIN_COUNT=2 NGRAM_EVAL_BUCKETS=4194304 torchrun --nproc_per_node=8 train_gpt.py

8xH100 SXM, 600s training + ~190s eval.

11L/512d U-Net + legal score-first 5-gram eval interpolation.
Inspired by @deanbrr's n-gram cache technique (PR openai#659).

3-seed results:
  seed 1337: 1.0451  (15.63MB)
  seed 42:   1.0471  (15.59MB)
  seed 2045: 1.0460  (15.64MB)
  mean:      1.0461

Run: SEED=2045 MLP_ACT=leaky_relu_sq MLP_LEAKY_SLOPE=0.5 \
     XSA_LAST_N=4 BIGRAM_VOCAB_SIZE=1536 ROPE_DIMS=24 \
     NGRAM_EVAL_ORDER=5 NGRAM_EVAL_ALPHA=0.20 \
     torchrun --nproc_per_node=8 train_gpt.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@valerio-oai
Copy link
Contributor

valerio-oai commented Mar 25, 2026

I quite like the 5-gram EvalCache idea and encourage you to resubmit a valid run with it, but this submission has no submission.json or train logs, so I can't validate it, and it looks like it does GPTQ at eval time with training data, which is disallowed.

However, if you removed the GPTQ and submitted a proper .json and training logs, I think the 5-gram EvalCache idea is at least directionally legal, would have to see the final code on a valid submission to make sure, though. Closing for now but please do resubmit! A concern I have with the current caching setup is that it sort of assumes we know what the next token is and only calculates the p_blended for that token -- in a real generation setting, we don't know what the correct token should be, so this wouldn't hold -- that being said, afaict this would only give an efficiency improvement (I think it spares you from having to calculate it over the whole vocab?) so as long as an updated version still fits within the eval time, I think the score should be quite similar. Will think about this more if you resubmit a similar run.

@newjordan
Copy link
Author

newjordan commented Mar 25, 2026

I cant reopen this and jsut uoplaod my times so now I have an older PR? why did you close this dude? should have been just a comment. You could have asked to see the logs before you closed this PR and lost my time submissions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants