Skip to content

Record: Fused Triton MLP + Full GPTQ + Coprime Loader + XSA-all + BH2816 (val_bpb 1.1116)#1135

Open
barneywohl wants to merge 1 commit intoopenai:mainfrom
barneywohl:submission-fused-gptq-coprime
Open

Record: Fused Triton MLP + Full GPTQ + Coprime Loader + XSA-all + BH2816 (val_bpb 1.1116)#1135
barneywohl wants to merge 1 commit intoopenai:mainfrom
barneywohl:submission-fused-gptq-coprime

Conversation

@barneywohl
Copy link
Copy Markdown

Record Submission

Author: @barneywohl
Date: 2026-03-30
val_bpb: 1.1116 ± 0.0005 (3-seed mean)

Results (8×H100 SXM)

Seed Sliding BPB Artifact
1337 1.1110 15,982,859
42 1.1121 15,981,083
2024 1.1118 15,982,475
Mean ± Std 1.1116 ± 0.0005

Improvement over SOTA

Stack

  1. Fused Triton MLP — custom kernel for leaky_relu(x,0.5).square(), saves 1.8ms/step
  2. Full Hessian GPTQ — Cholesky + actorder + 5-way clip sweep
  3. Coprime-stride loader — multi-shard diversity with memmap
  4. XSA on all 11 layers — exclusive self-attention everywhere
  5. BigramHash(2816×112) — enlarged bigram features
  6. fullgraph=True torch.compile

Built on PR #549 by @abaybektursun with techniques from PRs #726, #634, #1019, #287.

See records folder for full README, logs, and reproducible script.

…816 (val_bpb 1.1116)

3-seed mean: 1.1116 ± 0.0005
Seeds: 1337=1.1110, 42=1.1121, 2024=1.1118

Stack: LeakyReLU² fused Triton kernel + Full Hessian GPTQ (actorder+Cholesky)
+ coprime-stride multi-shard loader + XSA on all 11 layers + BigramHash(2816x112)
+ fullgraph=True torch.compile

Built on PR openai#549 scaffold with techniques from PRs openai#726, openai#634, openai#1019, openai#287.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant