Non-record: 10L + Batched LoRA TTT (val_bpb=1.1160) by hypery11 · Pull Request #525 · openai/parameter-golf

hypery11 · 2026-03-23T12:25:36Z

Summary

10-layer transformer with batched per-document LoRA test-time training.

Base val_bpb: 1.1476 (10 min train)
TTT val_bpb: 1.1160 (8.3 min eval)
Artifact size: 15.75 MB

Architecture

10L, 512 dim, 8/4 GQA, 3x MLP
Mixed int5/int6 quantization + zstd-22
Muon + AdamW, EMA averaging

LoRA TTT

Rank-8 on Q/V/LM-head, all layers
64 docs batched in parallel
Per-document reset, Adam lr=0.01
256-token chunks, 3 epochs, score on final epoch

Single seed result. 3-seed validation coming.

10-layer transformer with mixed int5/int6 quantization, improved activations, enhanced embeddings, and EMA averaging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Per-document rank-8 LoRA adaptation on Q/V/LM-head. Base 1.148 -> TTT 1.104. Eval time ~29min (needs speedup for 10min cap).

Batch-64 per-document LoRA adaptation. Rank-8 on Q/V/LM-head. Base 1.148 -> TTT 1.116 in 8.3 min eval. 256-token chunks, 3 epochs.

johndoe and others added 2 commits March 23, 2026 20:25

Non-record: 10L Optimized (val_bpb=1.1477)

da7f4fb

10-layer transformer with mixed int5/int6 quantization, improved activations, enhanced embeddings, and EMA averaging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Update: add LoRA TTT, val_bpb 1.1039

c16b51f

Per-document rank-8 LoRA adaptation on Q/V/LM-head. Base 1.148 -> TTT 1.104. Eval time ~29min (needs speedup for 10min cap).

hypery11 changed the title ~~Non-record: 10L Optimized (val_bpb=1.1477)~~ Non-record: 10L + LoRA TTT (val_bpb=1.1039, base=1.1485) Mar 23, 2026

Update: batched LoRA TTT, val_bpb 1.1160, eval 495s

1fc94e5

Batch-64 per-document LoRA adaptation. Rank-8 on Q/V/LM-head. Base 1.148 -> TTT 1.116 in 8.3 min eval. 256-token chunks, 3 epochs.

hypery11 changed the title ~~Non-record: 10L + LoRA TTT (val_bpb=1.1039, base=1.1485)~~ Non-record: 10L + Batched LoRA TTT (val_bpb=1.1160) Mar 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: 10L + Batched LoRA TTT (val_bpb=1.1160)#525

Non-record: 10L + Batched LoRA TTT (val_bpb=1.1160)#525
hypery11 wants to merge 3 commits intoopenai:mainfrom
hypery11:submission/2026-03-23_10L_Optimized

hypery11 commented Mar 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hypery11 commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture

LoRA TTT

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hypery11 commented Mar 23, 2026 •

edited

Loading