Non-record: 11L Int5 QAT + Score-First TTT — val_bpb 1.1326 (15.51 MB) by JoeProAI · Pull Request #861 · openai/parameter-golf

JoeProAI · 2026-03-26T15:58:39Z

11L U-Net + Int5 QAT + Score-First Legal TTT

val_bpb: 1.13256182 | 15.51 MB (16,265,723 bytes) | 8×H100 (~37 min)

What's different

Built on the PR #549 stack. Key additions:

Int5 QAT — weights quantized to [-15, 15] per-row (stored int8 + float16 scale). Tighter than int6, better zstd compression ratio.
Score-first TTT — AdamW on MLP-only params (up_proj, down_proj, gate_proj, scale). lr=0.0004, 1 epoch. Order: score chunk first, then adapt. Legal per PR Non-record: 11L Depth Recurrence + High-Yield Legal TTT (1.14458 BPB) #461 recipe.
MLP_HIDDEN=1536 — reduced from 1792 to fit artifact under 16 MB with int5.
15% weight pruning — zero smallest weights pre-quantization for better zstd compression.
Bigram hash embedding — 4096 buckets, 128-dim, added to token embeddings.
XSA on all 11 layers — full U-Net cross-layer shared attention.
Warmdown 6000 steps — longer QAT phase for better weight clustering near int5 boundaries.

Architecture

Results

Train log, submission.json, and training script included.

…submission

…g to fit int6 under 16MB - INT6_CLIP_PERCENTILE now reads from env (default 99.99984, wave46 uses 99.0) - PRUNE_PCT added to 1.0677 script (was missing, wave46 uses 0.25) - Modal harness wave46_clip_prune.py for detached runs - Both levers push zeros into weight tensors for better zstd compression - Base architecture: SwiGLU + U-Net + XSA4 + BigramHash(8192) = 1.0677 BPB pre-compression

Non-record: 11L Int5 QAT + Score-First TTT — val_bpb 1.1326 (15.51 MB)

3d6acb1

JoeProAI mentioned this pull request Mar 26, 2026

Record: SwiGLU+VE128+NoTTT val_bpb=1.1181 (3-seed mean) #505

Closed

JoeProAI added 2 commits March 26, 2026 18:15

Add RESULTS.md, requirements.txt, and run_training.sh to PR openai#861 …

b68b95d

…submission

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: 11L Int5 QAT + Score-First TTT — val_bpb 1.1326 (15.51 MB)#861

Non-record: 11L Int5 QAT + Score-First TTT — val_bpb 1.1326 (15.51 MB)#861
JoeProAI wants to merge 3 commits intoopenai:mainfrom
JoeProAI:submission/joeproai-11l-int5-ttt-1.1326

JoeProAI commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JoeProAI commented Mar 26, 2026

11L U-Net + Int5 QAT + Score-First Legal TTT

What's different

Architecture

Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant