Non-record: 11L Int5 QAT + Score-First TTT — val_bpb 1.1326 (15.51 MB)#861
Open
JoeProAI wants to merge 3 commits intoopenai:mainfrom
Open
Non-record: 11L Int5 QAT + Score-First TTT — val_bpb 1.1326 (15.51 MB)#861JoeProAI wants to merge 3 commits intoopenai:mainfrom
JoeProAI wants to merge 3 commits intoopenai:mainfrom
Conversation
…g to fit int6 under 16MB - INT6_CLIP_PERCENTILE now reads from env (default 99.99984, wave46 uses 99.0) - PRUNE_PCT added to 1.0677 script (was missing, wave46 uses 0.25) - Modal harness wave46_clip_prune.py for detached runs - Both levers push zeros into weight tensors for better zstd compression - Base architecture: SwiGLU + U-Net + XSA4 + BigramHash(8192) = 1.0677 BPB pre-compression
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
11L U-Net + Int5 QAT + Score-First Legal TTT
val_bpb: 1.13256182 | 15.51 MB (16,265,723 bytes) | 8×H100 (~37 min)
What's different
Built on the PR #549 stack. Key additions:
Architecture
Results
Train log, submission.json, and training script included.