Non-record: 2026-03-22_SuperchunkBPE_SP1024 by eshansinghal14 · Pull Request #506 · openai/parameter-golf

eshansinghal14 · 2026-03-23T05:41:36Z

Non-Record Submission: Superchunk BPE (vocab 1024)

Short run checking superchunk-trained Rust BPE (tokenizer.pkl + re-exported fineweb10B_superchunk1024 shards) against the stock 1024-vocab training recipe on 8×H100, 600s wall clock, non-record / 16MB track.

What superchunking is

Standard GPT-style BPE (here: rustbpe + tiktoken) learns merges inside regex-defined chunks (words, numbers, etc.). Superchunk BPE adds a second phase: it builds sequences where each chunk is represented as a single phase-1 token, then learns cross-chunk merges; those merges are interleaved by frequency with phase-1 merges into one merge table. At inference there is no separate “superchunk mode”—behavior is whatever that combined table encodes.

Data and setup

Tokenizer: superchunk BPE, vocab 1024 (same width as SP1024 baseline family).
Shards: fineweb10B_superchunk1024 from export_fineweb_custom_bins.py on docs_selected.jsonl.
Training: train_gpt.py with TOKENIZER_PATH pointing at the Rust BPE directory (tokenizer_kind=rust_bpe in log).

Results (from `train.log` tail + `submission.json`)

Metric	Value
Steps (wall stop)	9,131 / 20,000 (`stopping_early: wallclock_cap`)
Wall time	~600 s
Pre-quant `val_bpb` (last eval, step 9131)	1.2294
Pre-quant `val_loss`	2.0968
Post–int8+zlib round-trip `val_bpb`	1.23893525
Post–int8+zlib `val_loss`	2.11308352
`bytes_total` (int8+zlib + code)	15,868,556 (~15.1 MiB)
`bytes_model_int8_zlib`	15,818,828
Peak GPU memory	~10.2 GiB / rank (log)
Model params	17,059,912

Validation checkpoints in the log show val_bpb trending down through the run (e.g. 1.3844 @ step 1000 → 1.2294 @ step 9131).

Included files

train_gpy.py — training script snapshot for this run (filename as stored).
train.log — full stdout (includes pasted source + per-step metrics).
submission.json — leaderboard-style metadata for this entry.

Add submission (Non-record): 2026-03-22_SuperchunkBPE_SP1024

aac2862

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: 2026-03-22_SuperchunkBPE_SP1024#506

Non-record: 2026-03-22_SuperchunkBPE_SP1024#506
eshansinghal14 wants to merge 1 commit intoopenai:mainfrom
eshansinghal14:submission-branch

eshansinghal14 commented Mar 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eshansinghal14 commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Non-Record Submission: Superchunk BPE (vocab 1024)

What superchunking is

Data and setup

Results (from train.log tail + submission.json)

Included files

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

eshansinghal14 commented Mar 23, 2026 •

edited

Loading

Results (from `train.log` tail + `submission.json`)