submission: XSA-all + Soft-Round QAT + Full GPTQ + MLP 3.5x + AdamW TTT by senstar-hsoleimani · Pull Request #631 · openai/parameter-golf

senstar-hsoleimani · 2026-03-24T16:10:21Z

Track: 10min_16mb
Based on: PR #549 (LeakyReLU+ParallelMuon), PR #606 (Soft-Round+AdamW TTT), PR #609 (XSA-all+Full GPTQ)

Changes from SOTA (#549):

XSA on all 11 layers (was 4)
Soft-Round QAT with tanh-based differentiable rounding (alpha 1->16)
Full GPTQ with Hessian-aware column-reordered Cholesky error compensation
MHA 8/8 (was GQA 8/4)
MLP 3.5x expansion (1792 hidden, was 3.0x/1536)
BigramHash vocabulary 8192 (was 2048)
AdamW TTT with grouped LR and cosine schedule (was SGD)
Early QAT threshold 0.5 (was late 0.15)
Selective ±1 magnitude pruning to hit size target

Track: 10min_16mb Based on: PR openai#549 (LeakyReLU+ParallelMuon), PR openai#606 (Soft-Round+AdamW TTT), PR openai#609 (XSA-all+Full GPTQ) Changes from SOTA (openai#549): - XSA on all 11 layers (was 4) - Soft-Round QAT with tanh-based differentiable rounding (alpha 1->16) - Full GPTQ with Hessian-aware column-reordered Cholesky error compensation - MHA 8/8 (was GQA 8/4) - MLP 3.5x expansion (1792 hidden, was 3.0x/1536) - BigramHash vocabulary 8192 (was 2048) - AdamW TTT with grouped LR and cosine schedule (was SGD) - Early QAT threshold 0.5 (was late 0.15) - Selective ±1 magnitude pruning to hit size target

valerio-oai · 2026-03-24T16:22:45Z

This has no submission.json / val_bpb value, so I can't score it. Also, since this is based on #609, a word of warning that you need to make sure no training data is accessed at eval time, so any calibration mechanisms have to be part of the 600s of training time.

notapplica mentioned this pull request Mar 24, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

submission: XSA-all + Soft-Round QAT + Full GPTQ + MLP 3.5x + AdamW TTT#631

submission: XSA-all + Soft-Round QAT + Full GPTQ + MLP 3.5x + AdamW TTT#631
senstar-hsoleimani wants to merge 1 commit intoopenai:mainfrom
senstar-hsoleimani:submission/2026-03-24_XSAall_SoftRound_FullGPTQ_MLP35x

senstar-hsoleimani commented Mar 24, 2026

Uh oh!

valerio-oai commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

senstar-hsoleimani commented Mar 24, 2026

Uh oh!

valerio-oai commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants