Skip to content

Non-record: 11L Int5 QAT + Score-First TTT — val_bpb 1.1326 (15.51 MB)#861

Open
JoeProAI wants to merge 3 commits intoopenai:mainfrom
JoeProAI:submission/joeproai-11l-int5-ttt-1.1326
Open

Non-record: 11L Int5 QAT + Score-First TTT — val_bpb 1.1326 (15.51 MB)#861
JoeProAI wants to merge 3 commits intoopenai:mainfrom
JoeProAI:submission/joeproai-11l-int5-ttt-1.1326

Conversation

@JoeProAI
Copy link
Copy Markdown

11L U-Net + Int5 QAT + Score-First Legal TTT

val_bpb: 1.13256182 | 15.51 MB (16,265,723 bytes) | 8×H100 (~37 min)


What's different

Built on the PR #549 stack. Key additions:

  • Int5 QAT — weights quantized to [-15, 15] per-row (stored int8 + float16 scale). Tighter than int6, better zstd compression ratio.
  • Score-first TTT — AdamW on MLP-only params (up_proj, down_proj, gate_proj, scale). lr=0.0004, 1 epoch. Order: score chunk first, then adapt. Legal per PR Non-record: 11L Depth Recurrence + High-Yield Legal TTT (1.14458 BPB) #461 recipe.
  • MLP_HIDDEN=1536 — reduced from 1792 to fit artifact under 16 MB with int5.
  • 15% weight pruning — zero smallest weights pre-quantization for better zstd compression.
  • Bigram hash embedding — 4096 buckets, 128-dim, added to token embeddings.
  • XSA on all 11 layers — full U-Net cross-layer shared attention.
  • Warmdown 6000 steps — longer QAT phase for better weight clustering near int5 boundaries.

Architecture

Results

Train log, submission.json, and training script included.

…g to fit int6 under 16MB

- INT6_CLIP_PERCENTILE now reads from env (default 99.99984, wave46 uses 99.0)
- PRUNE_PCT added to 1.0677 script (was missing, wave46 uses 0.25)
- Modal harness wave46_clip_prune.py for detached runs
- Both levers push zeros into weight tensors for better zstd compression
- Base architecture: SwiGLU + U-Net + XSA4 + BigramHash(8192) = 1.0677 BPB pre-compression
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant