Record: Residual Input Mixing + mixed int6 GPTQ + grouped TTT + MLP 3.5x by danialht · Pull Request #615 · openai/parameter-golf

danialht · 2026-03-24T13:21:47Z

Record: Residual Input Mixing + mixed int6 GPTQ + grouped TTT + MLP 3.5x

val_bpb: 1.1169 (mean over 3 seeds with TTT evaluation, stride=64)

artifact: 15.6 MB (mean over 3 seeds)

TLDR Changes

Changed TTT from a flat optimizer to grouped AdamW with stronger matrix/head adaptation, while restoring standard clipping and removing the per-chunk warmup.
Changed Architecture: Making Residual Connections Denser, Changed block input formation so each transformer block now sees a learned mix of the current stream, earlier block outputs, and the original x0, instead of only the simpler local x/x0 residual mix. This gives the model a denser residual path and lets each block reuse longer-range intermediate features directly.

Results

Seed	Steps	final val_loss	final val_bpb	Artifact
1337	6106	1.8859	1.1169	15.88 MB
42	6092	1.8855	1.1167	15.33 MB
2024	6091	1.8864	1.1172	15.73 MB

val_bpb mean: 1.1169

val_bpb std: 0.0003

val_loss mean: 1.8859

More Details

Architecture: 11L, 512d, Mixed residuals each layer from 2 previous layers, MHA 8/8, MLP 3.5x (1792), BigramHash 8192, XSA all layers
Quantization: mixed int6 per-row GPTQ (clip_range=15) + Early QAT (threshold 0.5) + EMA 0.997
TTT: Legal score-first AdamW, chunk=131072, last 2 blocks plus control params unfrozen

valerio-oai · 2026-03-25T01:51:15Z

Same as #569 and #609: the use of training data at eval time is disallowed, and this code runs the full 600s of training before running GPTQ calibration, meaning it is part of eval-time compute and therefore not allowed.

record added

1d2995e

notapplica mentioned this pull request Mar 24, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

valerio-oai closed this Mar 25, 2026

valerio-oai mentioned this pull request Mar 25, 2026

Illegal submissions megathread #677

Open

danialht mentioned this pull request Mar 26, 2026

Record: Residual Input Mixing + mixed int6 GPTQ + grouped TTT + MLP 3.5x #790

Open

pentxayc mentioned this pull request Mar 26, 2026

Record: 0.4416 BPB -- Complementary Training + Backoff N-gram Mixer #803

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: Residual Input Mixing + mixed int6 GPTQ + grouped TTT + MLP 3.5x#615

Record: Residual Input Mixing + mixed int6 GPTQ + grouped TTT + MLP 3.5x#615
danialht wants to merge 1 commit intoopenai:mainfrom
danialht:main

danialht commented Mar 24, 2026

Uh oh!

valerio-oai commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

danialht commented Mar 24, 2026