Skip to content

Record: EngramLite + Gated Skips + Full GPTQ + FA3 — val_bpb 1.1146 (1-seed, 2 pending)#1122

Closed
icryo wants to merge 4 commits intoopenai:mainfrom
icryo:submission/icryo-1.1146
Closed

Record: EngramLite + Gated Skips + Full GPTQ + FA3 — val_bpb 1.1146 (1-seed, 2 pending)#1122
icryo wants to merge 4 commits intoopenai:mainfrom
icryo:submission/icryo-1.1146

Conversation

@icryo
Copy link
Copy Markdown

@icryo icryo commented Mar 30, 2026

Summary

Key Techniques

  • EngramLite: multi-head bigram+trigram hash (8192 buckets, 2 heads, 2 orders)
  • Sigmoid-gated skip connections: learned gates on U-Net skips
  • Full Hessian GPTQ: Cholesky error compensation, 64-batch calibration
  • Coprime-stride multi-shard loader: diverse batches across 80 shards
  • XSA on all 11 layers
  • LeakyReLU(0.3)², Turbo-Muon (4 NS steps), LR floor 0.05
  • FlashAttention 3 (Hopper native, pre-built wheel)

Results (1-seed, 2 pending)

Seed Steps ms/step Sliding BPB Artifact
1337 6,667 87.9 1.1146 15,711,654
42 pending
2025 pending

Test plan

  • Seed 1337 verified on 8×H100 SXM
  • Seed 42 (in progress)
  • Seed 2025 (in progress)
  • Artifact under 16,000,000 bytes
  • Training under 600s (586s)
  • No TTT, pure sliding window eval

@mikeapedia
Copy link
Copy Markdown

You might want to check how long GOTQ takes to actually run. I found that I didn't end up needing the full 14 seconds and was able to tighten the window down to 9 seconds.

@mikeapedia
Copy link
Copy Markdown

@icryo - By the way, I saw you are still using LZMA. If you switch to Brotil and use the shrink.py script in my PR you could probably shrink the artifact enough to promote 1 more tensor group to int6. It's probably not enough to increase MLP to 3.625 but it may be worth trying.

Also, you might want to test if coprime stride actually helps. When I tested it with my model it had a slightly negative impact.

@icryo
Copy link
Copy Markdown
Author

icryo commented Mar 30, 2026

@mikeapedia I appreciate the tips, running the final seeds. Will post results in about an hour.

@icryo
Copy link
Copy Markdown
Author

icryo commented Mar 31, 2026

Closing in favor of new pr

@icryo icryo closed this Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants