Record: XSA-all + Depth Recurrence + Hedge Mixer TTT (val_bpb=1.0222, 3-seed mean) by stukenov · Pull Request #745 · openai/parameter-golf

stukenov · 2026-03-25T17:02:35Z

Record: XSA-all + VRL + CROWN-Q + Depth Recurrence + Hedge Mixer TTT

val_bpb = 1.0222 (3-seed mean, std 0.0067) | <16 MB | 8xH100 SXM | 600s train, 507s eval

3-Seed Results

Seed	Pre-TTT bpb	Post-TTT bpb	TTT time	Artifact
1337	1.1336	1.0201	507s	15,857,972
42	1.1339	1.0165	508s	15,846,228
2025	1.1369	1.0299	507s	15,669,888
Mean	1.1348	1.0222 (std 0.0067)	507s

Compliance

Training: 600s on 8xH100 SXM
Eval (TTT + sliding): 507s on 8xH100 SXM (under 600s limit)
All artifacts under 16,000,000 bytes
Score-first TTT: every token scored under torch.inference_mode() before any weight update
N-gram tables built from already-scored tokens only
No training data access during evaluation
GPTQ-lite: no calibration data needed

6 Additions Over PR #549

XSA on all layers (PR Record: 11L XSA-all + Full GPTQ (Budget-Legal) + Parallel Muon + Selective Pruning (val_bpb: 1.1178, 3-seed mean) #634) — -0.006 BPB
Value Residual Learning (PR Record: 11L LeakyReLU² + VRL + lzma — val_bpb 1.1229 (3-seed mean) #657) — layer 0 V blended via sigmoid gates
Gated Attention (PR Record: 11L XSA-all + LeakyReLU(0.5)² + VR + GA (val_bpb=1.1164, pending 3-seed) #638) — per-head sigmoid gates
CROWN-Q (PR Record: CROWN-Q + Full GPTQ + SWA/EMA Blend — val_bpb 1.1186 (3-seed mean) #693) — curvature-weighted quant penalty during warmdown
Depth Recurrence (PR Record: Depth Recurrence (layers 4 and 5 repeated): val_bpb 1.1182 #686) — layers 4,5 repeated, 13 virtual from 11 physical
5-Expert Hedge Mixer (PR Record: 5-expert Hedge Mixer + TTT (3-seed mean val_bpb=1.0745) #688) — online mixing of neural + unigram + bigram + trigram + entropy experts via Hedge algorithm

Reproduction

SEED=1337 torchrun --standalone --nproc_per_node=8 train_gpt.py

All defaults in the script match the submitted results. No env vars needed.

Credits

PR #549 (@abaybektursun), #634 (@raahilshah), #657 (@anthony-maio), #638 (@Asukabot0), #693 (@EthanYangTW), #686 (@msisovic), #688 (@RoyiRa), #493 (@parinzee), #414 (@signalrush)

… 3-seed mean) Training: 600s, Eval: 507s — both within limits. 3 seeds: 1.0201, 1.0165, 1.0299 (mean 1.0222, std 0.0067)

valerio-oai · 2026-03-27T22:49:34Z

Thanks for your submission! Unfortunately, it's disallowed due to the use of hashed n-gram caches (your "Hedge Mixer"), which do not renormalize correctly / correctly reweight the LM's token distribution, look ahead to the target token to mix probabilities and therefore leak eval tokens. Please refer to the long discussion about this under the issues tab for more details, and please submit more runs in the future!

Record: XSA-all + Depth Recurrence + Hedge Mixer TTT (val_bpb=1.0222,…

4df68ee

… 3-seed mean) Training: 600s, Eval: 507s — both within limits. 3 seeds: 1.0201, 1.0165, 1.0299 (mean 1.0222, std 0.0067)

stukenov force-pushed the submission/v4-final-1epoch branch from 1139629 to 4df68ee Compare March 25, 2026 17:08

notapplica mentioned this pull request Mar 25, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

armantsaturian mentioned this pull request Mar 26, 2026

Record: 7-gram N-gram Cache (0.8960 bpb) #797

Open

This was referenced Mar 26, 2026

Progressive Depth + Hedge Mixer — val_bpb 1.1454 #856

Open

Non-record: 4-Hour Progressive Depth — val_bpb 1.0889 #895

Open

valerio-oai closed this Mar 27, 2026

valerio-oai mentioned this pull request Mar 27, 2026

Illegal submissions megathread #677

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: XSA-all + Depth Recurrence + Hedge Mixer TTT (val_bpb=1.0222, 3-seed mean)#745

Record: XSA-all + Depth Recurrence + Hedge Mixer TTT (val_bpb=1.0222, 3-seed mean)#745
stukenov wants to merge 1 commit intoopenai:mainfrom
stukenov:submission/v4-final-1epoch

stukenov commented Mar 25, 2026

Uh oh!

valerio-oai commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

stukenov commented Mar 25, 2026

Record: XSA-all + VRL + CROWN-Q + Depth Recurrence + Hedge Mixer TTT

3-Seed Results

Compliance

6 Additions Over PR #549

Reproduction

Credits

Uh oh!

valerio-oai commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants