Non-record: 11L XSA + SwiGLU + LoRA TTT (val_bpb=1.1573, 1xH100) by swapp1990 · Pull Request #2 · swapp1990/parameter-golf

swapp1990 · 2026-03-24T21:50:50Z

Summary

val_bpb: 1.1573 (LoRA TTT) | 15.02 MB artifact | 1xH100 PCIe, ~80 min
11-layer transformer: XSA (last 4 layers), SwiGLU 3x MLP, SmearGate, U-Net skips, OrthoInit, Muon WD=0.04, SWA
Mixed quantization: int5-MLP + int6-attn + int8-embed + zstd-22
Score-then-train LoRA TTT (rank-8, 256-token chunks) brings val_bpb from 1.191 → 1.157
18 experiments over 5 days, from val_bpb=3.10 to 1.1573 (~$50 total compute)

Why Non-Record

Trained on 1xH100 PCIe with grad accumulation (~80 min), not 8xH100 in 10 min. Architecture is identical to what would run on 8xH100.

Test plan

Full training pipeline validated on 2xH100 dry run
Mixed quantization fits in 15.02 MB (< 16 MB)
LoRA TTT parallelized across GPUs with all_reduce
Score-then-train ordering verified (legal per PR Record: PROTEUS v8 — 11L INT6 + LoRA TTT 5ep cosine (mean val_bpb=0.7853, 4 seeds) openai/parameter-golf#568 ruling)
Pending: 8xH100 record run when spot capacity available

🤖 Generated with Claude Code

…3, 1xH100) Non-record submission for the parameter-golf challenge. 11-layer transformer with XSA, SwiGLU, SmearGate, U-Net skips, mixed quantization (15 MB), and score-then-train LoRA TTT. Trained on 1xH100 PCIe with grad accumulation (~80 min). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: 11L XSA + SwiGLU + LoRA TTT (val_bpb=1.1573, 1xH100)#2

Non-record: 11L XSA + SwiGLU + LoRA TTT (val_bpb=1.1573, 1xH100)#2
swapp1990 wants to merge 1 commit intomainfrom
submission/nonrecord-11l-xsa-lora-ttt

swapp1990 commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

swapp1990 commented Mar 24, 2026

Summary

Why Non-Record

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant