Record*: val_bpb=0.978 BPB — Goldfish ML Autonomous Research (100ep Cosine *leaky* TTT) by lukacf · Pull Request #517 · openai/parameter-golf

lukacf · 2026-03-23T09:13:28Z

Edit: Record* = Yes, but uses suspect leaky TTT, so not a clean solution.

Summary

val_bpb = 0.9789 (3-seed mean, sliding window stride=64, std=0.0015)
15.51 MB artifact, 8xH100 SXM
Built on PR Non-record: 11L EMA + TTT(20ep,freeze=0) + 15-run ablation study — val_bpb=1.1213 (3-seed) #398/Record: 11L EMA + AdamW TTT 10ep (mean val_bpb=1.1027) #442 baseline
Training: 600s. TTT: 1463s (100 epochs, CosineAnnealingLR)

Technique

CosineAnnealingLR applied to AdamW TTT optimizer, decaying lr from 0.001 to 0.00001 over 100 epochs. Prevents position-specific overfitting that limits constant-lr TTT to ~30 epochs. Three lines of code.

Methodology

Experiments were run autonomously by an AI coding agent using Goldfish ML for experiment orchestration and provenance tracking. The agent executed the full research loop — hypothesis, implementation, launch, monitoring, analysis, iteration — without human intervention on the training code.

Seven experiments were completed in ~2 hours wall-clock time, progressing from baseline replication (1.085 BPB) through the cosine LR discovery (1.018) to the final result (0.978). Dead ends (weight decay, BigramHash scaling, Value Residual) were documented automatically. Full experiment lineage and compressed timeline in the submission README.

Files

train_gpt.py — standalone training script
train.log — full log (seed 1337)
submission.json — 3-seed results
README.md — detailed write-up with experiment timeline and provenance

3-seed mean: 0.9789 BPB (sliding window stride=64) Best seed: 0.9779 (seed 7) Std: 0.0015 Key innovation: Autonomous ML research methodology. AI coding agent discovered cosine LR scaling for TTT in a single 2-hour session — 7 experiments from hypothesis to record. Technical: CosineAnnealingLR over 100 TTT epochs (3-line change). Architecture: PR openai#398/openai#442 base (11L, int6+zstd, 15.51MB).

lukacf · 2026-03-23T09:46:19Z

10 min training time, but I missed the 10 min eval time limit, which I now see this violates by a factor of 2. So goes into the "non leaderboard" bucket.

Remove obsolete experiment scripts, profiling tools, old run scripts, and stale research docs. The project now builds on PR openai#512 (PROTEUS) and PR openai#517 (Goldfish ML) as control scripts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

lukacf · 2026-03-23T15:31:48Z

I should add that the way TTT used is clearly suspect wrt. eval tokens leaking into training. Educational, but not a clean solution.

lukacf · 2026-03-23T20:14:25Z

Closing PR (TTT not valid)

notapplica mentioned this pull request Mar 23, 2026

Parameter Golf Live AI Commentary + Analysis / Ideas | every 10 minutes #140

Open

ADIITJ mentioned this pull request Mar 23, 2026

[track_10min_16mb] 50-Epoch Cosine LoRA TTT + SOTA (10L Int5/Int6 BigramHash SWA) — Atharva Date (ADIITJ) #467

Open

6 tasks

NotADevIAmaMeatPopsicle mentioned this pull request Mar 23, 2026

Record: pcloadloveletter v6 — Novel Codebook+Huffman Compression + AdamW TTT (val_bpb=1.0487) #532

Closed

lukacf mentioned this pull request Mar 23, 2026

Invalid submissions due to information leakage during TTT #402

Open

lukacf changed the title ~~Record: 0.978 BPB — Goldfish ML Autonomous Research (100ep Cosine TTT)~~ Record*: val_bpb=0.978 BPB — Goldfish ML Autonomous Research (100ep Cosine *leaky* TTT) Mar 23, 2026

lukacf closed this Mar 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: val_bpb=0.978 BPB — Goldfish ML Autonomous Research (100ep Cosine leaky* TTT)#517

Record: val_bpb=0.978 BPB — Goldfish ML Autonomous Research (100ep Cosine leaky* TTT)#517
lukacf wants to merge 1 commit intoopenai:mainfrom
lukacf:main

lukacf commented Mar 23, 2026 •

edited

Loading

Uh oh!

lukacf commented Mar 23, 2026

Uh oh!

lukacf commented Mar 23, 2026

Uh oh!

lukacf commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lukacf commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Technique

Methodology

Files

Uh oh!

lukacf commented Mar 23, 2026

Uh oh!

lukacf commented Mar 23, 2026

Uh oh!

lukacf commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lukacf commented Mar 23, 2026 •

edited

Loading