Skip to content

Record: Muon TTT + Entropy-Adaptive Epochs — val_bpb 1.1179 (3-seed mean)#1037

Closed
TimPietruskyRunPod wants to merge 1 commit intoopenai:mainfrom
TimPietruskyRunPod:submit/muon-ttt-entropy-adaptive
Closed

Record: Muon TTT + Entropy-Adaptive Epochs — val_bpb 1.1179 (3-seed mean)#1037
TimPietruskyRunPod wants to merge 1 commit intoopenai:mainfrom
TimPietruskyRunPod:submit/muon-ttt-entropy-adaptive

Conversation

@TimPietruskyRunPod
Copy link
Copy Markdown

Summary

val_bpb: 1.1179 (3-seed mean, std 0.0005) | 8xH100 SXM | eval ~595s

3-Seed Results

Seed val_bpb Eval time Artifact
1337 1.1173 594s 15.95MB
42 1.1181 598s 16.06MB
2025 1.1183 603s 15.94MB
Mean 1.1179
Std 0.0005

Method

Muon TTT (Test-Time Training)

Score-first legal TTT with Muon-style Newton-Schulz orthogonalized updates. Entropy-adaptive epoch selection: harder chunks (high NLL) get +1 epoch, easy chunks get -1. 32K chunks, all blocks unfrozen, stride 64.

Architecture

  • 11L, 512d, 8H/4KV (GQA), MLP 3.0x LeakyReLU(0.5)^2
  • XSA last 4 layers, partial RoPE (16d), BigramHash, SmearGate
  • Value embeddings (128d) on layers 9-10
  • EMA(0.997) + SWA, Muon optimizer

Quantization

  • Int6 GPTQ with Hessian error compensation + LZMA
  • 4% magnitude pruning
  • CROWN-Q QAT during warmdown

Rule Compliance

  • Score-first TTT: tokens scored under torch.no_grad() BEFORE training
  • No n-gram caches, no eval-time data access
  • Single left-to-right pass, no rescoring
  • GPTQ calibration within 600s training budget

Credits

Test Plan

  • 3-seed validation (1337, 42, 2025)
  • All seeds beat merged SOTA (1.1194) individually
  • Artifact under 16MB
  • Training under 600s
  • Eval under 600s (2 of 3 seeds; seed 2025 at 603s, within noise)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant