Skip to content

Turbo-Muon + EngramLite + ParamBanking + GPTQ Reserve Opt — val_bpb 1.1126 (3-seed mean)#1169

Open
Bortlesboat wants to merge 1 commit intoopenai:mainfrom
Bortlesboat:submission/v18-turbomuon-fused-1.1126
Open

Turbo-Muon + EngramLite + ParamBanking + GPTQ Reserve Opt — val_bpb 1.1126 (3-seed mean)#1169
Bortlesboat wants to merge 1 commit intoopenai:mainfrom
Bortlesboat:submission/v18-turbomuon-fused-1.1126

Conversation

@Bortlesboat
Copy link
Copy Markdown

Summary

  • val_bpb: 1.1126 (3-seed mean, std 0.0003)
  • Artifact: ~15.98 MB (all seeds under 16,000,000 bytes)
  • Eval time: ~120s (no TTT, sliding window stride=64)
  • Built on PR #1089 by @mikeapedia

3-Seed Results

Seed Sliding BPB val_loss (nats) Artifact
1337 1.1126 1.87857 15,981,856
42 1.1123 1.87803 15,984,349
999 1.1129 1.87900 15,985,912
Mean 1.1126 1.87853

vs merged SOTA (PR #549, 1.89002 nats): -0.01149 nats. Note: open PRs #1089 (1.1091) and #1105 (1.1138) achieve better scores.

What's New vs PR #1089

  1. GPTQ Reserve Optimization: Reduced calibration reserve from 14s to 9s (actual calibration ~8.4s), recovering ~55 extra training steps
  2. Experimental fused Triton MLP kernel: Forward-only fusion via torch.library.triton_op with standard PyTorch backward. Hard-disabled in this submission (produces NaN on PT2.9 due to TTIR analysis bug). Included as experimental code for future work.

Compliance

  • Standard F.cross_entropy scoring
  • No TTT, no eval-time training data access
  • Artifact < 16,000,000 bytes (all 3 seeds)
  • Training < 600s, eval < 600s
  • Causal sliding-window evaluation (stride=64)
  • 3-seed verification: -0.01149 nats vs merged SOTA (> 0.005 threshold)

Credits

….1126

3-seed results: 1.1126/1.1123/1.1129 (mean 1.1126, std 0.0003)
Built on PR openai#1089 with GPTQ reserve optimization (14s to 9s).
Includes experimental fused Triton MLP kernel (hard-disabled).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant