Record*: val_bpb=0.978 BPB — Goldfish ML Autonomous Research (100ep Cosine *leaky* TTT)#517
Closed
lukacf wants to merge 1 commit intoopenai:mainfrom
Closed
Record*: val_bpb=0.978 BPB — Goldfish ML Autonomous Research (100ep Cosine *leaky* TTT)#517lukacf wants to merge 1 commit intoopenai:mainfrom
lukacf wants to merge 1 commit intoopenai:mainfrom
Conversation
3-seed mean: 0.9789 BPB (sliding window stride=64) Best seed: 0.9779 (seed 7) Std: 0.0015 Key innovation: Autonomous ML research methodology. AI coding agent discovered cosine LR scaling for TTT in a single 2-hour session — 7 experiments from hypothesis to record. Technical: CosineAnnealingLR over 100 TTT epochs (3-line change). Architecture: PR openai#398/openai#442 base (11L, int6+zstd, 15.51MB).
Author
|
10 min training time, but I missed the 10 min eval time limit, which I now see this violates by a factor of 2. So goes into the "non leaderboard" bucket. |
6 tasks
lolrazh
added a commit
to lolrazh/parameter-golf
that referenced
this pull request
Mar 23, 2026
Remove obsolete experiment scripts, profiling tools, old run scripts, and stale research docs. The project now builds on PR openai#512 (PROTEUS) and PR openai#517 (Goldfish ML) as control scripts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Author
|
I should add that the way TTT used is clearly suspect wrt. eval tokens leaking into training. Educational, but not a clean solution. |
Author
|
Closing PR (TTT not valid) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Edit: Record* = Yes, but uses suspect leaky TTT, so not a clean solution.
Summary
Technique
CosineAnnealingLR applied to AdamW TTT optimizer, decaying lr from 0.001 to 0.00001 over 100 epochs. Prevents position-specific overfitting that limits constant-lr TTT to ~30 epochs. Three lines of code.
Methodology
Experiments were run autonomously by an AI coding agent using Goldfish ML for experiment orchestration and provenance tracking. The agent executed the full research loop — hypothesis, implementation, launch, monitoring, analysis, iteration — without human intervention on the training code.
Seven experiments were completed in ~2 hours wall-clock time, progressing from baseline replication (1.085 BPB) through the cosine LR discovery (1.018) to the final result (0.978). Dead ends (weight decay, BigramHash scaling, Value Residual) were documented automatically. Full experiment lineage and compressed timeline in the submission README.
Files
train_gpt.py— standalone training scripttrain.log— full log (seed 1337)submission.json— 3-seed resultsREADME.md— detailed write-up with experiment timeline and provenance