Non-record: Full GPTQ + XSA-4 + Score-First TTT (3-seed mean 1.1198) by Robby955 · Pull Request #734 · openai/parameter-golf

Robby955 · 2026-03-25T15:24:50Z

Results (8xH100 SXM, PyTorch 2.9.1+cu128)

Seed	Steps	ms/step	Post-TTT BPB	Artifact
1337	6,461	86.67	1.1193	15,899,061
42	6,457	86.73	1.1196	15,954,941
2025	6,457	86.74	1.1205	15,907,769

Mean: 1.1198 | Std: 0.0006

Compliance

Training: 560s training + 40s GPTQ calibration = 600s total (within 10-min budget)
GPTQ calibration: Uses training data, runs within training time budget (NOT during eval)
Eval: ~82s sliding window + ~236s score-first TTT = ~318s (within 10-min eval limit)
No training data accessed during evaluation
TTT: Score-first protocol — score chunk under inference_mode(), then adapt. Never re-score.
Artifact: All seeds under 16,000,000 bytes
Script: 1,499 lines

Key Techniques

Full Hessian GPTQ — 256-batch calibration, Cholesky error compensation, act-order, block_size=128
XSA on last 4 layers — cross-sequence attention for extended context
SWA/EMA 50/50 blend — EMA(0.997) + tight SWA (every 50 steps during warmdown)
Score-first TTT — AdamW(lr=1e-4), 3 epochs, freeze first 9/11 blocks, 128K-token chunks
LZMA compression — better ratio than zstd for int6 weights

Architecture

11L, 512d, 8H/4KV GQA, LeakyReLU(0.5)² MLP 3x, BigramHash(3072×128), VE128, Partial RoPE 16/64, LN Scale, U-Net skips, SmearGate.

Credits

Base architecture: PR Record: Late Soft-Round QAT + Score-First Backward-Looking TTT — val_bpb 1.1178 #589 by @RoyiRa
GPTQ reference: PR Non-record: 11L XSA-all + Full GPTQ + Selective Pruning (val_bpb=1.1154, 3-seed) #609 by @saml212, PR Record: Full GPTQ + LeakyReLU² + Parallel Muon (3-seed mean 1.1180) #626 by @kshitizz36
Score-first TTT: PR Non-record: 11L Depth Recurrence + High-Yield Legal TTT (1.14458 BPB) #461 by @Christopher-Lee-McClendon
XSA, BigramHash, SmearGate: Various community contributors

🤖 Generated with Claude Code

…n 1.1198) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Non-record: Full GPTQ + XSA-4 + SWA/EMA + Score-First TTT (3-seed mea…

281a26e

…n 1.1198) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

This was referenced Mar 26, 2026

Record: 0.2292 BPB — Dirichlet-Multinomial Smoothing + Distributed Prefill + 15-Gram + EBLS #796

Open

Record: Two-Level Dirichlet Posterior Mixing with Per-Order OBCL -- 0.1156 BPB #900

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: Full GPTQ + XSA-4 + Score-First TTT (3-seed mean 1.1198)#734

Non-record: Full GPTQ + XSA-4 + Score-First TTT (3-seed mean 1.1198)#734
Robby955 wants to merge 1 commit intoopenai:mainfrom
Robby955:submission/2026-03-25_GPTQ_XSA4_TTT_1.1198

Robby955 commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Robby955 commented Mar 25, 2026

Results (8xH100 SXM, PyTorch 2.9.1+cu128)

Compliance

Key Techniques

Architecture

Credits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant