Record: Order-Adaptive Entropy Gating + XSA-All (val_bpb=0.9370)#774
Open
travispchen wants to merge 1 commit intoopenai:mainfrom
Open
Record: Order-Adaptive Entropy Gating + XSA-All (val_bpb=0.9370)#774travispchen wants to merge 1 commit intoopenai:mainfrom
travispchen wants to merge 1 commit intoopenai:mainfrom
Conversation
…ed mean) N-gram7 BPB: 0.9370 (±0.0003) across seeds 1337/42/2025 Sliding BPB: 1.1222 (±0.0003) Artifact: ~15.9 MB (within 16MB cap) Training: 600s on 8xH100 Key innovation: order-adaptive entropy gating assigns different entropy thresholds per n-gram order. High-order matches (7-gram) trusted at moderate model confidence; low-order matches (2-gram) only trusted when model is very uncertain. Built on PR openai#753 (Podracing II) with XSA extended to all 11 layers and entropy_center=3.0. Co-Authored-By: Travis Chen <travispchen@gmail.com>
|
we need to share notes! i mean.. we jsut did =) but claibrations. I havent calibrated it yet |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
N-gram7 BPB: 0.9370 (±0.0003) across seeds 1337/42/2025
Sliding BPB: 1.1222 (±0.0003)
Artifact: ~15.9 MB (within 16MB cap)
Training: 600s on 8xH100
Key innovation: order-adaptive entropy gating assigns different entropy thresholds per n-gram order. High-order matches (7-gram) trusted at moderate model confidence; low-order matches (2-gram) only trusted when model is very uncertain.
Built on PR #753 (Podracing II) with XSA extended to all 11 layers and entropy_center=3.0.