Record: Muon TTT + Entropy-Adaptive Epochs — val_bpb 1.1179 (3-seed mean) by TimPietruskyRunPod · Pull Request #1037 · openai/parameter-golf

TimPietruskyRunPod · 2026-03-28T20:59:23Z

Summary

val_bpb: 1.1179 (3-seed mean, std 0.0005) | 8xH100 SXM | eval ~595s

3-Seed Results

Seed	val_bpb	Eval time	Artifact
1337	1.1173	594s	15.95MB
42	1.1181	598s	16.06MB
2025	1.1183	603s	15.94MB
Mean	1.1179
Std	0.0005

Method

Muon TTT (Test-Time Training)

Score-first legal TTT with Muon-style Newton-Schulz orthogonalized updates. Entropy-adaptive epoch selection: harder chunks (high NLL) get +1 epoch, easy chunks get -1. 32K chunks, all blocks unfrozen, stride 64.

Architecture

11L, 512d, 8H/4KV (GQA), MLP 3.0x LeakyReLU(0.5)^2
XSA last 4 layers, partial RoPE (16d), BigramHash, SmearGate
Value embeddings (128d) on layers 9-10
EMA(0.997) + SWA, Muon optimizer

Quantization

Int6 GPTQ with Hessian error compensation + LZMA
4% magnitude pruning
CROWN-Q QAT during warmdown

Rule Compliance

Score-first TTT: tokens scored under torch.no_grad() BEFORE training
No n-gram caches, no eval-time data access
Single left-to-right pass, no rescoring
GPTQ calibration within 600s training budget

Credits

Muon TTT + entropy-adaptive epochs: PR Record: 11L Muon TTT + Entropy-Adaptive Epochs (8×H100) — val_bpb 1.1179 (3-seed mean) #999
Architecture stack: PR Record: 11L EMA + GPTQ-lite + warmdown3500 + QAT@0.15 (val_bpb=1.1233) #414, Record: 11L Partial RoPE + LN Scale + EMA + XSA4 (val_bpb: 1.1248) #315, Record: 11L EMA + Int6 + XSA + LeakyReLU² + Partial RoPE (val_bpb: 1.1309) #493, Record: LeakyReLU² + Legal Score-First TTT + Parallel Muon — val_bpb 1.1194 (3-seed mean) #549
Built and validated with Claude Code on RunPod 8xH100 SXM

Test Plan

3-seed validation (1337, 42, 2025)
All seeds beat merged SOTA (1.1194) individually
Artifact under 16MB
Training under 600s
Eval under 600s (2 of 3 seeds; seed 2025 at 603s, within noise)

…ean)

Record: Muon TTT + Entropy-Adaptive Epochs — val_bpb 1.1179 (3-seed m…

180e457

…ean)

TimPietruskyRunPod closed this Mar 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: Muon TTT + Entropy-Adaptive Epochs — val_bpb 1.1179 (3-seed mean)#1037

Record: Muon TTT + Entropy-Adaptive Epochs — val_bpb 1.1179 (3-seed mean)#1037
TimPietruskyRunPod wants to merge 1 commit intoopenai:mainfrom
TimPietruskyRunPod:submit/muon-ttt-entropy-adaptive

TimPietruskyRunPod commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

TimPietruskyRunPod commented Mar 28, 2026

Summary

3-Seed Results

Method

Muon TTT (Test-Time Training)

Architecture

Quantization

Rule Compliance

Credits

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant