Skip to content

Record: Coprime-Stride Loader + Full GPTQ + XSA-all — val_bpb 1.1133 (3-seed mean)#1099

Open
Bortlesboat wants to merge 2 commits intoopenai:mainfrom
Bortlesboat:submission/v16-coprime-gptq-1.1136
Open

Record: Coprime-Stride Loader + Full GPTQ + XSA-all — val_bpb 1.1133 (3-seed mean)#1099
Bortlesboat wants to merge 2 commits intoopenai:mainfrom
Bortlesboat:submission/v16-coprime-gptq-1.1136

Conversation

@Bortlesboat
Copy link
Copy Markdown

@Bortlesboat Bortlesboat commented Mar 29, 2026

Summary

  • val_bpb: 1.1133 (3-seed mean, std 0.0001)
  • Artifact: ~15.89 MB (all seeds under 16,000,000 bytes)
  • Eval time: ~85s (no TTT, sliding window stride=64)
  • Built on PR #549 by @abaybektursun and PR #1060 by @dexhunter

3-Seed Results

Seed Sliding BPB Artifact
1337 1.1133 15,899,687
42 1.1132 15,881,359
999 1.1133 15,892,371
Mean +/- Std 1.1133 +/- 0.0001

What's New

  1. GPTQ Reserve Optimization: Reduced calibration reserve from 14s to 9s (actual calibration takes ~8.4s), recovering ~55 extra training steps
  2. FA3/FA2 Graceful Fallback: try/except import for flash_attn_interface with fallback to flash_attn

Stack

Compliance

  • Standard F.cross_entropy scoring (softmax, sum=1)
  • No TTT, no mixer, no eval-built adaptation
  • Artifact < 16,000,000 bytes (all 3 seeds)
  • Training < 600s, eval < 600s
  • Causal sliding-window evaluation (stride=64)

See README.md for full details.

…(3-seed mean)

3-seed results: 1.1136/1.1133/1.1139 (mean 1.1136, std 0.0003)
Built on PR openai#549 + PR openai#1060 with optimized GPTQ reserve (10s vs 14s).
Improved from 1.1136 to 1.1133 by reducing GPTQ reserve from 10s to 9s.
Seeds: 1.1133/1.1132/1.1133 (mean 1.1133, std 0.0001)
All artifacts under 16MB.
@Bortlesboat Bortlesboat changed the title Record: Coprime-Stride Loader + Full GPTQ + XSA-all — val_bpb 1.1136 (3-seed mean) Record: Coprime-Stride Loader + Full GPTQ + XSA-all — val_bpb 1.1133 (3-seed mean) Mar 29, 2026
theLightArchitect added a commit to theLightArchitect/parameter-golf that referenced this pull request Mar 30, 2026
Single innovation: coprime-stride shard traversal. Instead of
reading shards 0,1,2,...,79, reads 0,7,14,...,77,4,11,... where
stride=7 is coprime to 80 shards. Prevents repeated token sequences
across epochs. PR openai#1099 gets 1.1136 with this (vs 1.1217 baseline).

12 lines added. Zero HP changes. Zero architecture changes.
Same quantization path. Artifact unchanged.

Co-Authored-By: Kevin Tan <kft@lightarchitects.io>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant