Record: 1.1122 BPB — Coprime-Stride Loader + Full GPTQ + XSA-all (3-seed mean) by dexhunter · Pull Request #1060 · openai/parameter-golf

dexhunter · 2026-03-29T06:15:28Z

Summary

val_bpb: 1.1122 (3-seed mean, std 0.0004)
Artifact: ~15.98 MB
Eval time: ~87s (no TTT)
Built on PR Record: LeakyReLU² + Legal Score-First TTT + Parallel Muon — val_bpb 1.1194 (3-seed mean) #549 by @abaybektursun

What's New

Coprime-stride multi-shard data pipeline (PR Memmap multi-shard data pipeline + GPU prefetch + LeakyReLU² + Legal TTT + Parallel Muon #726 style) — diverse batches from coprime-stride block sampling across shards
Full Hessian GPTQ (PR Record: 11L XSA-all + Full GPTQ (Budget-Legal) + Parallel Muon + Selective Pruning (val_bpb: 1.1178, 3-seed mean) #634/Record: AR Self-Gen GPTQ + XSA-all + BigramHash 3072×112 — val_bpb 1.11473 (3-seed mean) #1019 style) — Cholesky error compensation replaces GPTQ-lite
XSA on all 11 layers — extended from last 4
No TTT — sliding-only outperforms TTT on this stack (confirmed independently by PR Record: AR Self-Gen GPTQ + XSA-all + BigramHash 3072×112 — val_bpb 1.11473 (3-seed mean) #1019)

3-Seed Results

Seed	Sliding BPB	Artifact
1337	1.1118	15,973,962
42	1.1127	15,980,438
2025	1.1121	15,983,626
Mean	1.1122

Compliance

3-seed verification, all under budget
Standard F.cross_entropy scoring (no mixer, no cache)
Artifact < 16,000,000 bytes (all seeds)
Training < 600s, eval < 600s
No TTT — pure sliding window evaluation

See README.md for full details.

@abaybektursun

3-seed mean val_bpb: 1.1123 (std 0.0005) All artifacts under 16MB, all eval under 600s. Key changes from PR openai#549: - Coprime-stride multi-shard data pipeline (PR openai#726 style) - Full Hessian GPTQ with Cholesky error compensation - XSA on all 11 layers - BigramHash(2816×112) - No TTT (sliding-only outperforms on this stack) Built on PR openai#549 by @abaybektursun.

…headroom)

Seed logs now generated with the same 96,398-byte train_gpt.py that ships in this record. Previous logs were from the pre-strip 111,130-byte version. Updated results: Seed 1337: 1.1118 BPP, 15,973,962 bytes Seed 42: 1.1127 BPP, 15,980,438 bytes Seed 2025: 1.1121 BPP, 15,983,626 bytes Mean: 1.1122 ± 0.0004

dexhunter · 2026-03-29T08:34:00Z

Updated: re-verified all 3 seeds with the stripped train_gpt.py (96,398 bytes) that ships in this record. Previous logs were generated with a pre-strip version (111,130 bytes) that included unused code paths. Scores are unchanged — 3-seed mean 1.1122 ± 0.0004, all artifacts under 16MB. Code size and logs are now fully consistent.

Fixes openai#1060

dexhunter · 2026-03-29T13:31:29Z

Follow-up cleanup for the stripped submission artifacts only.

What changed:

replaced the bundled train_seed1337.log short extract with the clean extract from the actual stripped-code run log
clarified in the record README that all 3 bundled seed results and the included train_gpt.py are from the stripped submission script (Code size: 96,398 bytes)
clarified reproduction from within the records folder and tightened the eval/rule-compliance wording

Why:

the previous train_seed1337.log extract accidentally included a launcher traceback / truncated preamble from an earlier invocation, which made the record bundle look inconsistent even though the underlying stripped run was valid
there is no model/code/score change here; all 3 seeds already match the stripped script, and the recorded metrics are unchanged

I re-ran the local rule checker on all 3 bundled logs after the cleanup and they pass cleanly.

Competition moved while we were experimenting locally: PR openai#634: 1.1178 BPB (Full GPTQ + XSA-all + selective pruning) PR openai#1060: 1.1122 BPB (+ coprime loader + BigramHash 2816) Our contribution: TTT periodic reset on the PR openai#1060 base. PR openai#1060 found TTT unnecessary with Full GPTQ, but they didn't test TTT with anti-drift reset. If TTT drift was the reason it stopped helping, reset could unlock further gains. Files: train_gpt_ours.py — PR openai#1060 + TTT reset mechanism train_gpt_pr634.py — Full GPTQ reference (for study) train_gpt_pr1060.py — Original PR openai#1060 (for comparison) run_h100.sh — Train once, sweep 4 TTT configs TTT configs tested: A: SOTA (lr=0.002, 3ep) — baseline TTT B: PR openai#1039 (lr=0.0025, 4ep) — tuned TTT C: B + reset/100 — anti-drift, moderate D: B + reset/50 — anti-drift, aggressive Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@valerio-oai

…-gram invalidation - PR openai#771 closed (rule violation: multi-epoch TTT re-scored same eval tokens) - N-gram eval cache banned: 33+ PRs closed by @valerio-oai on 2026-03-27 due to normalization bug; correct n-gram achieves ~1.51 BPB (worse than baseline) - Update merged SOTA to 1.1194 (PR openai#549, was 1.1228) - New target: PR openai#1060 (1.1122) — Full Hessian GPTQ + XSA-all + Coprime-stride - Add Lessons 17-20 and v8.0 strategy to CLAUDE.md - Add 2026-03-29 daily research report to logs/daily_research.md https://claude.ai/code/session_01GabEptdqRohHFtkKNZNL17

…(3-seed mean) 3-seed results: 1.1136/1.1133/1.1139 (mean 1.1136, std 0.0003) Built on PR openai#549 + PR openai#1060 with optimized GPTQ reserve (10s vs 14s).

… reset Combines the best of three approaches: PR openai#1060 (1.1122): coprime loader + Full GPTQ + XSA-all PR openai#1072 (1.117): fused Triton MLP (matmul+activation, 70ms/step) Ours: TTT periodic reset (anti-drift) Expected: ~7900 steps (vs 6700) with PR openai#1060 quality innovations = best training throughput + best quantization + best eval. Fused MLP kernel from PR openai#1072 uses TMA TensorDescriptors (H100 only). Falls back to standard path on non-Hopper GPUs. TTT sweep tests 4 configs on the same trained checkpoint: sota_ttt, pr1039, reset/100, reset/50 Total H100 time: ~10min train + 4×7min TTT ≈ 40 min Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…Agreement Package the validated three-seed rerun of the PR openai#1060-derived Loader FullGPTQ XSA11 stack with the online causal ngram agreement evaluator. Include the runnable record folder, benchmark log, and submission metadata for the under-10-minute eval path. Made-with: Cursor

Critical realization: our ported innovations (EngramLite, gated skips, LeakyReLU(0.3)², Turbo-Muon) HURT by 0.003 BPB vs the PR openai#1060 baseline. PR openai#1060 gets 1.1122, our merged version gets 1.1151. The partial port of PR openai#1089 innovations doesn't capture their interactions. Clean path to record: run PR openai#1060's exact code with FA3 + GPTQ_RESERVE=9s. Expected: 1.1115-1.1122 BPB (well below 1.1144 threshold).

Complete pipeline to beat openai#1 (1.0806 BPB): - train_gpt_scylla_stack.py: PR openai#1060 + metadata-based tokenizer loading - retokenize.py: TokenMonster retokenization of FineWeb - deploy_scylla.sh: two-phase deploy (retokenize once, train many) Strategy: PR openai#1143 used old stack. We use PR openai#1060's modern stack (GPTQ, XSA-all, coprime loader) on the same Scylla tokenizer. Expected: ~1.070-1.080 BPB (beating both openai#1143 and openai#1089).

dexhunter added 2 commits March 29, 2026 06:15

fix: add run command, requirements.txt for reproducibility

a274d9e

notapplica mentioned this pull request Mar 29, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

dexhunter added 2 commits March 29, 2026 07:33

chore: strip dead code from train_gpt.py (111KB→96KB, +14KB artifact …

fcf51c9

…headroom)

dexhunter changed the title ~~Record: 1.1123 BPB — Coprime-Stride Loader + Full GPTQ + XSA-all (3-seed mean)~~ Record: 1.1122 BPB — Coprime-Stride Loader + Full GPTQ + XSA-all (3-seed mean) Mar 29, 2026

resouer pushed a commit to resouer/parameter-golf that referenced this pull request Mar 29, 2026

exp: port openai#1060 quantizer envelope on top of openai#64

0a6d76b

docs(record): clean stripped submission logs

87c1e24

Fixes openai#1060

Bortlesboat mentioned this pull request Mar 29, 2026

Record: Coprime-Stride Loader + Full GPTQ + XSA-all — val_bpb 1.1133 (3-seed mean) #1099

Open

5 tasks

icryo mentioned this pull request Mar 30, 2026

Record: EngramLite + Gated Skips + Full GPTQ + FA3 — val_bpb 1.1146 (1-seed, 2 pending) #1122

Open

6 tasks

Gusanidas added a commit to Gusanidas/parameter-golf that referenced this pull request Mar 30, 2026

Fix README: LeakyReLU squared, credit PR openai#1060 for GPTQ

521f148

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Gusanidas mentioned this pull request Mar 30, 2026

Record: 1.1140 BPB — ResidLambdas + Split-LR + Train-Budget GPTQ + Coprime Loader (12-seed mean) #1130

Open

barneywohl mentioned this pull request Mar 30, 2026

Record: Fused Triton MLP + Full GPTQ + Coprime Loader + XSA-all + BH2816 (val_bpb 1.1116) #1135

Open

AnirudhRahul mentioned this pull request Mar 30, 2026

Record: 1.1109 BPB FullGPTQ XSA11 + online ngram augment #1145

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: 1.1122 BPB — Coprime-Stride Loader + Full GPTQ + XSA-all (3-seed mean)#1060

Record: 1.1122 BPB — Coprime-Stride Loader + Full GPTQ + XSA-all (3-seed mean)#1060
dexhunter wants to merge 5 commits intoopenai:mainfrom
dexhunter:submission/2026-03-29-loader-fullgptq-xsa11

dexhunter commented Mar 29, 2026 •

edited

Loading

Uh oh!

dexhunter commented Mar 29, 2026

Uh oh!

dexhunter commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dexhunter commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's New

3-Seed Results

Compliance

Uh oh!

dexhunter commented Mar 29, 2026

Uh oh!

dexhunter commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dexhunter commented Mar 29, 2026 •

edited

Loading