Skip to content

submission: XSA-all + Soft-Round QAT + Full GPTQ + MLP 3.5x + AdamW TTT#631

Open
senstar-hsoleimani wants to merge 1 commit intoopenai:mainfrom
senstar-hsoleimani:submission/2026-03-24_XSAall_SoftRound_FullGPTQ_MLP35x
Open

submission: XSA-all + Soft-Round QAT + Full GPTQ + MLP 3.5x + AdamW TTT#631
senstar-hsoleimani wants to merge 1 commit intoopenai:mainfrom
senstar-hsoleimani:submission/2026-03-24_XSAall_SoftRound_FullGPTQ_MLP35x

Conversation

@senstar-hsoleimani
Copy link

Track: 10min_16mb
Based on: PR #549 (LeakyReLU+ParallelMuon), PR #606 (Soft-Round+AdamW TTT), PR #609 (XSA-all+Full GPTQ)

Changes from SOTA (#549):

  • XSA on all 11 layers (was 4)
  • Soft-Round QAT with tanh-based differentiable rounding (alpha 1->16)
  • Full GPTQ with Hessian-aware column-reordered Cholesky error compensation
  • MHA 8/8 (was GQA 8/4)
  • MLP 3.5x expansion (1792 hidden, was 3.0x/1536)
  • BigramHash vocabulary 8192 (was 2048)
  • AdamW TTT with grouped LR and cosine schedule (was SGD)
  • Early QAT threshold 0.5 (was late 0.15)
  • Selective ±1 magnitude pruning to hit size target

Track: 10min_16mb
Based on: PR openai#549 (LeakyReLU+ParallelMuon), PR openai#606 (Soft-Round+AdamW TTT), PR openai#609 (XSA-all+Full GPTQ)

Changes from SOTA (openai#549):
- XSA on all 11 layers (was 4)
- Soft-Round QAT with tanh-based differentiable rounding (alpha 1->16)
- Full GPTQ with Hessian-aware column-reordered Cholesky error compensation
- MHA 8/8 (was GQA 8/4)
- MLP 3.5x expansion (1792 hidden, was 3.0x/1536)
- BigramHash vocabulary 8192 (was 2048)
- AdamW TTT with grouped LR and cosine schedule (was SGD)
- Early QAT threshold 0.5 (was late 0.15)
- Selective ±1 magnitude pruning to hit size target
@valerio-oai
Copy link
Contributor

This has no submission.json / val_bpb value, so I can't score it. Also, since this is based on #609, a word of warning that you need to make sure no training data is accessed at eval time, so any calibration mechanisms have to be part of the 600s of training time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants