Record: Coprime-Stride Loader + Full GPTQ + XSA-all — val_bpb 1.1133 (3-seed mean) by Bortlesboat · Pull Request #1099 · openai/parameter-golf

Bortlesboat · 2026-03-29T21:41:26Z

Summary

val_bpb: 1.1133 (3-seed mean, std 0.0001)
Artifact: ~15.89 MB (all seeds under 16,000,000 bytes)
Eval time: ~85s (no TTT, sliding window stride=64)
Built on PR #549 by @abaybektursun and PR #1060 by @dexhunter

3-Seed Results

Seed	Sliding BPB	Artifact
1337	1.1133	15,899,687
42	1.1132	15,881,359
999	1.1133	15,892,371
Mean +/- Std	1.1133 +/- 0.0001

What's New

GPTQ Reserve Optimization: Reduced calibration reserve from 14s to 9s (actual calibration takes ~8.4s), recovering ~55 extra training steps
FA3/FA2 Graceful Fallback: try/except import for flash_attn_interface with fallback to flash_attn

Stack

Coprime-stride multi-shard data pipeline (PR Memmap multi-shard data pipeline + GPU prefetch + LeakyReLU² + Legal TTT + Parallel Muon #726 style)
Full Hessian GPTQ with Cholesky error compensation
XSA on all 11 layers
BigramHash(2816x112), SmearGate, Partial RoPE(16d), LN Scale
EMA(0.997), Parallel Muon + Parameter Banking
FA3 Hopper, ~91ms/step, ~6,500 steps

Compliance

Standard F.cross_entropy scoring (softmax, sum=1)
No TTT, no mixer, no eval-built adaptation
Artifact < 16,000,000 bytes (all 3 seeds)
Training < 600s, eval < 600s
Causal sliding-window evaluation (stride=64)

See README.md for full details.

…(3-seed mean) 3-seed results: 1.1136/1.1133/1.1139 (mean 1.1136, std 0.0003) Built on PR openai#549 + PR openai#1060 with optimized GPTQ reserve (10s vs 14s).

Improved from 1.1136 to 1.1133 by reducing GPTQ reserve from 10s to 9s. Seeds: 1.1133/1.1132/1.1133 (mean 1.1133, std 0.0001) All artifacts under 16MB.

Single innovation: coprime-stride shard traversal. Instead of reading shards 0,1,2,...,79, reads 0,7,14,...,77,4,11,... where stride=7 is coprime to 80 shards. Prevents repeated token sequences across epochs. PR openai#1099 gets 1.1136 with this (vs 1.1217 baseline). 12 lines added. Zero HP changes. Zero architecture changes. Same quantization path. Artifact unchanged. Co-Authored-By: Kevin Tan <kft@lightarchitects.io> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Bortlesboat added 2 commits March 29, 2026 17:40

Record: Coprime-Stride Loader + Full GPTQ + XSA-all — val_bpb 1.1136 …

cf068e9

…(3-seed mean) 3-seed results: 1.1136/1.1133/1.1139 (mean 1.1136, std 0.0003) Built on PR openai#549 + PR openai#1060 with optimized GPTQ reserve (10s vs 14s).

Update to GPTQ_RESERVE=9s: val_bpb 1.1133 (3-seed mean, std 0.0001)

7eb6f24

Improved from 1.1136 to 1.1133 by reducing GPTQ reserve from 10s to 9s. Seeds: 1.1133/1.1132/1.1133 (mean 1.1133, std 0.0001) All artifacts under 16MB.

Bortlesboat changed the title ~~Record: Coprime-Stride Loader + Full GPTQ + XSA-all — val_bpb 1.1136 (3-seed mean)~~ Record: Coprime-Stride Loader + Full GPTQ + XSA-all — val_bpb 1.1133 (3-seed mean) Mar 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: Coprime-Stride Loader + Full GPTQ + XSA-all — val_bpb 1.1133 (3-seed mean)#1099

Record: Coprime-Stride Loader + Full GPTQ + XSA-all — val_bpb 1.1133 (3-seed mean)#1099
Bortlesboat wants to merge 2 commits intoopenai:mainfrom
Bortlesboat:submission/v16-coprime-gptq-1.1136

Bortlesboat commented Mar 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Bortlesboat commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

3-Seed Results

What's New

Stack

Compliance

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Bortlesboat commented Mar 29, 2026 •

edited

Loading