Record: 1.1122 BPB — Coprime-Stride Loader + Full GPTQ + XSA-all (3-seed mean)#1060
Open
dexhunter wants to merge 5 commits intoopenai:mainfrom
Open
Record: 1.1122 BPB — Coprime-Stride Loader + Full GPTQ + XSA-all (3-seed mean)#1060dexhunter wants to merge 5 commits intoopenai:mainfrom
dexhunter wants to merge 5 commits intoopenai:mainfrom
Conversation
3-seed mean val_bpb: 1.1123 (std 0.0005) All artifacts under 16MB, all eval under 600s. Key changes from PR openai#549: - Coprime-stride multi-shard data pipeline (PR openai#726 style) - Full Hessian GPTQ with Cholesky error compensation - XSA on all 11 layers - BigramHash(2816×112) - No TTT (sliding-only outperforms on this stack) Built on PR openai#549 by @abaybektursun.
Seed logs now generated with the same 96,398-byte train_gpt.py that ships in this record. Previous logs were from the pre-strip 111,130-byte version. Updated results: Seed 1337: 1.1118 BPP, 15,973,962 bytes Seed 42: 1.1127 BPP, 15,980,438 bytes Seed 2025: 1.1121 BPP, 15,983,626 bytes Mean: 1.1122 ± 0.0004
Author
|
Updated: re-verified all 3 seeds with the stripped train_gpt.py (96,398 bytes) that ships in this record. Previous logs were generated with a pre-strip version (111,130 bytes) that included unused code paths. Scores are unchanged — 3-seed mean 1.1122 ± 0.0004, all artifacts under 16MB. Code size and logs are now fully consistent. |
resouer
pushed a commit
to resouer/parameter-golf
that referenced
this pull request
Mar 29, 2026
Author
|
Follow-up cleanup for the stripped submission artifacts only. What changed:
Why:
I re-ran the local rule checker on all 3 bundled logs after the cleanup and they pass cleanly. |
icryo
added a commit
to icryo/parameter-golf
that referenced
this pull request
Mar 29, 2026
Competition moved while we were experimenting locally: PR openai#634: 1.1178 BPB (Full GPTQ + XSA-all + selective pruning) PR openai#1060: 1.1122 BPB (+ coprime loader + BigramHash 2816) Our contribution: TTT periodic reset on the PR openai#1060 base. PR openai#1060 found TTT unnecessary with Full GPTQ, but they didn't test TTT with anti-drift reset. If TTT drift was the reason it stopped helping, reset could unlock further gains. Files: train_gpt_ours.py — PR openai#1060 + TTT reset mechanism train_gpt_pr634.py — Full GPTQ reference (for study) train_gpt_pr1060.py — Original PR openai#1060 (for comparison) run_h100.sh — Train once, sweep 4 TTT configs TTT configs tested: A: SOTA (lr=0.002, 3ep) — baseline TTT B: PR openai#1039 (lr=0.0025, 4ep) — tuned TTT C: B + reset/100 — anti-drift, moderate D: B + reset/50 — anti-drift, aggressive Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
sunnypatneedi
pushed a commit
to sunnypatneedi/parameter-golf
that referenced
this pull request
Mar 29, 2026
…-gram invalidation - PR openai#771 closed (rule violation: multi-epoch TTT re-scored same eval tokens) - N-gram eval cache banned: 33+ PRs closed by @valerio-oai on 2026-03-27 due to normalization bug; correct n-gram achieves ~1.51 BPB (worse than baseline) - Update merged SOTA to 1.1194 (PR openai#549, was 1.1228) - New target: PR openai#1060 (1.1122) — Full Hessian GPTQ + XSA-all + Coprime-stride - Add Lessons 17-20 and v8.0 strategy to CLAUDE.md - Add 2026-03-29 daily research report to logs/daily_research.md https://claude.ai/code/session_01GabEptdqRohHFtkKNZNL17
Bortlesboat
added a commit
to Bortlesboat/parameter-golf
that referenced
this pull request
Mar 29, 2026
…(3-seed mean) 3-seed results: 1.1136/1.1133/1.1139 (mean 1.1136, std 0.0003) Built on PR openai#549 + PR openai#1060 with optimized GPTQ reserve (10s vs 14s).
5 tasks
icryo
added a commit
to icryo/parameter-golf
that referenced
this pull request
Mar 29, 2026
… reset Combines the best of three approaches: PR openai#1060 (1.1122): coprime loader + Full GPTQ + XSA-all PR openai#1072 (1.117): fused Triton MLP (matmul+activation, 70ms/step) Ours: TTT periodic reset (anti-drift) Expected: ~7900 steps (vs 6700) with PR openai#1060 quality innovations = best training throughput + best quantization + best eval. Fused MLP kernel from PR openai#1072 uses TMA TensorDescriptors (H100 only). Falls back to standard path on non-Hopper GPUs. TTT sweep tests 4 configs on the same trained checkpoint: sota_ttt, pr1039, reset/100, reset/50 Total H100 time: ~10min train + 4×7min TTT ≈ 40 min Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6 tasks
Gusanidas
added a commit
to Gusanidas/parameter-golf
that referenced
this pull request
Mar 30, 2026
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AnirudhRahul
pushed a commit
to AnirudhRahul/parameter-golf
that referenced
this pull request
Mar 30, 2026
…Agreement Package the validated three-seed rerun of the PR openai#1060-derived Loader FullGPTQ XSA11 stack with the online causal ngram agreement evaluator. Include the runnable record folder, benchmark log, and submission metadata for the under-10-minute eval path. Made-with: Cursor
3 tasks
icryo
added a commit
to icryo/parameter-golf
that referenced
this pull request
Mar 31, 2026
Critical realization: our ported innovations (EngramLite, gated skips, LeakyReLU(0.3)², Turbo-Muon) HURT by 0.003 BPB vs the PR openai#1060 baseline. PR openai#1060 gets 1.1122, our merged version gets 1.1151. The partial port of PR openai#1089 innovations doesn't capture their interactions. Clean path to record: run PR openai#1060's exact code with FA3 + GPTQ_RESERVE=9s. Expected: 1.1115-1.1122 BPB (well below 1.1144 threshold).
icryo
added a commit
to icryo/parameter-golf
that referenced
this pull request
Mar 31, 2026
Complete pipeline to beat openai#1 (1.0806 BPB): - train_gpt_scylla_stack.py: PR openai#1060 + metadata-based tokenizer loading - retokenize.py: TokenMonster retokenization of FineWeb - deploy_scylla.sh: two-phase deploy (retokenize once, train many) Strategy: PR openai#1143 used old stack. We use PR openai#1060's modern stack (GPTQ, XSA-all, coprime loader) on the same Scylla tokenizer. Expected: ~1.070-1.080 BPB (beating both openai#1143 and openai#1089).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
What's New
3-Seed Results
Compliance
See README.md for full details.