Record: 11L Muon TTT + Entropy-Adaptive Epochs (8×H100) — val_bpb 1.1179 (3-seed mean) by aamodbhatt · Pull Request #999 · openai/parameter-golf

aamodbhatt · 2026-03-28T02:27:49Z

Summary

Two novel TTT innovations on the SOTA base stack (PR #399 + PR #414 + PR #461): Muon-style Newton-Schulz orthogonalized updates replace SGD in the TTT loop, and entropy-adaptive epoch selection concentrates adaptation budget on harder content. Beats current SOTA (1.1194) with a 3-seed mean of 1.1179.

Run Results (3 seeds)

Seed	`legal_ttt_exact val_bpb`	`legal_ttt_exact val_loss`	pre-quant `val_bpb`	train time	eval time (TTT)	artifact size
1337	`1.11765030`	`1.88710072`	`1.1366`	`599.1s`	`477.9s`	`15,944,410 bytes`
42	`1.11812929`	`1.88790947`	`1.1371`	`599.1s`	`485.3s`	`15,873,826 bytes`
2025	`1.11789934`	`1.88752121`	`1.1367`	`599.1s`	`479.2s`	`15,879,042 bytes`
mean	`1.11789`

Method Notes

NUM_LAYERS=11, BIGRAM_VOCAB_SIZE=1536, XSA_LAST_N=4
TTT_ENABLED=1, score-first path
TTT_MUON=1 — Newton-Schulz orthogonalized updates in TTT loop (NS steps=3)
TTT_ENTROPY_ADAPT=1 — entropy-adaptive 2/3/4 epochs per chunk (H_HIGH=2.1, H_LOW=1.75)
TTT_LR=0.002, TTT_EPOCHS=3, TTT_CHUNK_TOKENS=32768
NGRAM_EVAL_ENABLED=0
NGRAM_TWO_PASS_ENABLED=0
NGRAM_FULL_RESCORE=0
EMA_ENABLED=1, SWA_ENABLED=1, LATE_QAT=1, VE_ENABLED=1
WARMDOWN_ITERS=3500, MAX_WALLCLOCK_SECONDS=599

Submission Checklist

One folder under records/track_10min_16mb/
Included README.md, submission.json, train_gpt.py, and train logs (3 seeds)
Training <= 600s
Eval <= 600s
Artifact <= 16,000,000 bytes
No tokenizer/dataset modifications
Score-first TTT (SCORE under inference_mode before TRAIN on same chunk)
No n-gram, no two-pass, no external data lookup

…179 (3-seed mean) Two novel TTT innovations: (1) Muon-style Newton-Schulz orthogonalized updates replace SGD in the TTT loop; (2) entropy-adaptive 2/3/4 epochs per chunk based on globally-synced chunk NLL. 3-seed mean 1.1179, std 0.0002. All under 16MB/600s.

slope 0.75 + LR 0.027 + warmdown 3700 (PR openai#977) No SWA with QAT (PR openai#989) QAT from 50% + range fix [-31,31] mHC 22-param residual mixing (PR openai#928) VE128 + no gated_attn + no value_residual (PR openai#549) LZMA preset 7 compression (PR openai#999) Muon TTT with NS3 (PR openai#999) Entropy-adaptive TTT epochs 2/3/4 (PR openai#999) Per-layer TTT LR (PR openai#995) TTT momentum 0.95 (PR openai#995)

CLAUDE.md: Complete project state for cross-session continuity - Leaderboard intel (verified SOTA + unverified PRs openai#1006, openai#999, openai#831) - 8192 vocab analysis (doesn't fit — only 9,994 bytes headroom) - Three planned improvements with code status - Environment setup instructions (Mac MLX + RunPod H100) - Codebase layout and git remotes experiments.md: 4 planned experiments with commands + success criteria Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

aamodbhatt force-pushed the record-2026-03-28-muon-ttt-entropy-adaptive branch from bffd426 to f219235 Compare March 28, 2026 03:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: 11L Muon TTT + Entropy-Adaptive Epochs (8×H100) — val_bpb 1.1179 (3-seed mean)#999

Record: 11L Muon TTT + Entropy-Adaptive Epochs (8×H100) — val_bpb 1.1179 (3-seed mean)#999
aamodbhatt wants to merge 1 commit intoopenai:mainfrom
aamodbhatt:record-2026-03-28-muon-ttt-entropy-adaptive

aamodbhatt commented Mar 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aamodbhatt commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Run Results (3 seeds)

Method Notes

Submission Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aamodbhatt commented Mar 28, 2026 •

edited

Loading