Skip to content

Non-record: VR + GA + Late QAT + Full GPTQ — 1.1418 BPB, 15.7 MB#601

Open
anantdgoel wants to merge 1 commit intoopenai:mainfrom
anantdgoel:submission-lateqat-vr-ga-gptq
Open

Non-record: VR + GA + Late QAT + Full GPTQ — 1.1418 BPB, 15.7 MB#601
anantdgoel wants to merge 1 commit intoopenai:mainfrom
anantdgoel:submission-lateqat-vr-ga-gptq

Conversation

@anantdgoel
Copy link
Copy Markdown

val_bpb: 1.1418 | 15.7 MB | 1x NVIDIA RTX A6000, ~14 hours

Summary

11-layer GPT combining the community meta-stack with novel techniques Value Residual (VR) and Gated Attention (GA), plus Late QAT during training and a Full GPTQ + Int5 MLP post-training quantization pipeline. Achieves 1.1418 BPB (stride=128) in a 15.7 MB artifact that fits under the 16 MB limit.

Update pending: BH10240 (bigram hash 10240 buckets) variant currently evaluating — expect improved results soon.

Novel Contributions

  • Value Residual (VR): Layer-0 V vector shortcut for deep attention signal flow (−0.015 BPB). Inspired by arXiv:2410.17897.
  • Gated Attention (GA): Per-head learned sigmoid gate after SDPA (−0.003 BPB). Inspired by arXiv:2505.06708.
  • Late QAT: LR-threshold-based fake-quantize during final ~5% of training.
  • Full GPTQ + Int5 MLP post-training: Hessian-aware quantization + int5 MLP re-quantization (−0.028 BPB, −3.6 MB).
  • Finding: TTT hurts on GPTQ-quantized models (+0.030 BPB). Quantized weight space is incompatible with gradient-based test-time adaptation.

Ablation Results (stride=128)

Configuration BPB Delta
Base int6+zstd (no post-training) 1.1696
+ Full GPTQ + Int5 + GPTQ-lite 1.1418 −0.028
+ VR_V0_FP16 (asymmetric V0 quant) 1.1418 +0.000
+ SGD TTT (legal, cosine, per-layer) 1.1721 +0.030

Credits

Built on top of the excellent community meta-stack. Key techniques originated from:

Files

  • train_gpt.py — Full training + eval script with all techniques
  • submission.json — Metadata
  • README.md — Detailed writeup with ablations and reproducibility commands

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant