Non-record: Basis Block Interpolation (novel negative result) + Hyperparameter Sweep (MATRIX_LR=0.03 improves SOTA by 0.059 bpb)#530
Open
j420 wants to merge 2 commits intoopenai:mainfrom
Open
Non-record: Basis Block Interpolation (novel negative result) + Hyperparameter Sweep (MATRIX_LR=0.03 improves SOTA by 0.059 bpb)#530j420 wants to merge 2 commits intoopenai:mainfrom
j420 wants to merge 2 commits intoopenai:mainfrom
Conversation
mrdavtan
added a commit
to mrdavtan/parameter-golf
that referenced
this pull request
Mar 23, 2026
LeakyReLU(0.5)^2: zero extra params, proven -0.003 BPB vs relu^2. Addresses dead neuron problem. LEAKY_RELU=1 env var. run_no_ttt_best.sh: run3 base + three free lunches: - MATRIX_LR=0.03 (PR openai#530, verified -0.005+ BPB) - LeakyReLU(0.5)^2 (zero params, -0.003 BPB) - QAT=1 (run5 proved negative quant gap) Drops sigmoid gates and decoder 2x LR (run6 showed they hurt). Real target is openai#445 at 1.1236 (not openai#505 which doesn't fit 16MB).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Novel architecture exploration + systematic hyperparameter optimization.
Key contributions:
at dim=576. Documented as informative negative result — block reuse is
bottlenecked by torch.compile(fullgraph=False) speed penalty.
MATRIX_LR=0.03 as 0.059 bpb improvement over default 0.02.
Best val_bpb: 1.4963 (1xH100, standard eval)
Track: non-record