Podracing: 1.0461 BPB (3-seed mean) by newjordan · Pull Request #674 · openai/parameter-golf

newjordan · 2026-03-25T03:50:12Z

Results

Seed	Sliding BPB	5-gram BPB	Artifact
1337	1.1190	1.0451	15.63 MB
42	1.1217	1.0471	15.59 MB
2045	1.1200	1.0460	15.64 MB
Mean	1.1202	1.0461	—

Progression

PR	Mean BPB	Notes
#190	—	The Stinky Frost Recipe
#390, #401	1.1295, 1.1243	Sponge Bath TTT + EMA/SWA/QAT
#445	1.1236	Late Training Replay + EMA + GPTQ-lite
#498, #499	1.1478	The Frugendorff (recursive weight sharing)
#508, #578	1.1215	GPTQ + Early QAT + Legal TTT
#533, #577	1.1207	GPTQ + Short TTT
#587	1.1208	XSA + quantization tuning
#656	1.1195	Three Breadsticks (activation + eval)
This PR	1.0461	Podracing (5-gram eval interpolation)

Key Addition

Legal score-first hashed 5-gram interpolation during sliding window eval. Fixed-weight linear mixing (alpha=0.20), no target-aware gating. Cache built from already-scored tokens only. Strictly backward-looking.

Inspired by and credited to @deanbrr (PR #659) for the n-gram eval cache concept.

Architecture

11L/512d U-Net, 26.93M params. LeakyReLU² (slope 0.5), XSA last 4, BigramHash 1536. GPTQ int6+zstd, late QAT.

Reproduce

SEED=2045 MLP_ACT=leaky_relu_sq MLP_LEAKY_SLOPE=0.5 XSA_LAST_N=4 BIGRAM_VOCAB_SIZE=1536 ROPE_DIMS=24 NGRAM_EVAL_ORDER=5 NGRAM_EVAL_ALPHA=0.20 NGRAM_EVAL_MIN_COUNT=2 NGRAM_EVAL_BUCKETS=4194304 torchrun --nproc_per_node=8 train_gpt.py

8xH100 SXM, 600s training + ~190s eval.

@deanbrr

11L/512d U-Net + legal score-first 5-gram eval interpolation. Inspired by @deanbrr's n-gram cache technique (PR openai#659). 3-seed results: seed 1337: 1.0451 (15.63MB) seed 42: 1.0471 (15.59MB) seed 2045: 1.0460 (15.64MB) mean: 1.0461 Run: SEED=2045 MLP_ACT=leaky_relu_sq MLP_LEAKY_SLOPE=0.5 \ XSA_LAST_N=4 BIGRAM_VOCAB_SIZE=1536 ROPE_DIMS=24 \ NGRAM_EVAL_ORDER=5 NGRAM_EVAL_ALPHA=0.20 \ torchrun --nproc_per_node=8 train_gpt.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

valerio-oai · 2026-03-25T06:23:09Z

I quite like the 5-gram EvalCache idea and encourage you to resubmit a valid run with it, but this submission has no submission.json or train logs, so I can't validate it, and it looks like it does GPTQ at eval time with training data, which is disallowed.

However, if you removed the GPTQ and submitted a proper .json and training logs, I think the 5-gram EvalCache idea is at least directionally legal, would have to see the final code on a valid submission to make sure, though. Closing for now but please do resubmit! A concern I have with the current caching setup is that it sort of assumes we know what the next token is and only calculates the p_blended for that token -- in a real generation setting, we don't know what the correct token should be, so this wouldn't hold -- that being said, afaict this would only give an efficiency improvement (I think it spares you from having to calculate it over the whole vocab?) so as long as an updated version still fits within the eval time, I think the score should be quite similar. Will think about this more if you resubmit a similar run.

newjordan · 2026-03-25T12:40:20Z

I cant reopen this and jsut uoplaod my times so now I have an older PR? why did you close this dude? should have been just a comment. You could have asked to see the logs before you closed this PR and lost my time submissions

valerio-oai closed this Mar 25, 2026

valerio-oai mentioned this pull request Mar 25, 2026

Illegal submissions megathread #677

Open

MatoTeziTanka mentioned this pull request Mar 25, 2026

Parameter Golf Live AI Commentary + Analysis / Ideas | every 10 minutes #140

Open

lukacf mentioned this pull request Mar 25, 2026

Record: 1.0240 BPB — Multi-Order N-gram Backoff + Entropy-Adaptive Alpha (100% autonomous research via goldfish) #702

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Podracing: 1.0461 BPB (3-seed mean)#674

Podracing: 1.0461 BPB (3-seed mean)#674
newjordan wants to merge 1 commit intoopenai:mainfrom
newjordan:submission/podracing

newjordan commented Mar 25, 2026 •

edited

Loading

Uh oh!

valerio-oai commented Mar 25, 2026 •

edited

Loading

Uh oh!

newjordan commented Mar 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

newjordan commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Results

Progression

Key Addition

Architecture

Reproduce

Uh oh!

valerio-oai commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

newjordan commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

newjordan commented Mar 25, 2026 •

edited

Loading

valerio-oai commented Mar 25, 2026 •

edited

Loading

newjordan commented Mar 25, 2026 •

edited

Loading