Skip to content

Non-record: Basis Block Interpolation (novel negative result) + Hyperparameter Sweep (MATRIX_LR=0.03 improves SOTA by 0.059 bpb)#530

Open
j420 wants to merge 2 commits intoopenai:mainfrom
j420:main
Open

Non-record: Basis Block Interpolation (novel negative result) + Hyperparameter Sweep (MATRIX_LR=0.03 improves SOTA by 0.059 bpb)#530
j420 wants to merge 2 commits intoopenai:mainfrom
j420:main

Conversation

@j420
Copy link

@j420 j420 commented Mar 23, 2026

Novel architecture exploration + systematic hyperparameter optimization.

Key contributions:

  • Basis Block Interpolation: 5 basis blocks × 3 unrolls = 15 effective layers
    at dim=576. Documented as informative negative result — block reuse is
    bottlenecked by torch.compile(fullgraph=False) speed penalty.
  • Hyperparameter sweep: 15+ controlled experiments on 1xH100 SXM identifying
    MATRIX_LR=0.03 as 0.059 bpb improvement over default 0.02.

Best val_bpb: 1.4963 (1xH100, standard eval)
Track: non-record

mrdavtan added a commit to mrdavtan/parameter-golf that referenced this pull request Mar 23, 2026
LeakyReLU(0.5)^2: zero extra params, proven -0.003 BPB vs relu^2.
Addresses dead neuron problem. LEAKY_RELU=1 env var.

run_no_ttt_best.sh: run3 base + three free lunches:
  - MATRIX_LR=0.03 (PR openai#530, verified -0.005+ BPB)
  - LeakyReLU(0.5)^2 (zero params, -0.003 BPB)
  - QAT=1 (run5 proved negative quant gap)

Drops sigmoid gates and decoder 2x LR (run6 showed they hurt).
Real target is openai#445 at 1.1236 (not openai#505 which doesn't fit 16MB).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant