Record-track submission: 11L XSA4 + Late Shared Workspace Adapter (LSWA-64x4) + MLP2.5#988
Closed
ymrohit wants to merge 1 commit intoopenai:mainfrom
Closed
Record-track submission: 11L XSA4 + Late Shared Workspace Adapter (LSWA-64x4) + MLP2.5#988ymrohit wants to merge 1 commit intoopenai:mainfrom
ymrohit wants to merge 1 commit intoopenai:mainfrom
Conversation
Author
|
Withdrawing this submission. The shared workspace adapter path is non-causal and can leak future-token information within the evaluated sequence, so the reported score is not a valid autoregressive LM benchmark. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Record-Track Submission: 11L XSA4 + Late Shared Workspace Adapter (LSWA-64x4) + MLP2.5
Best included legal seed: 1.08568610 val_bpb | 15,900,041 bytes | 8xH100 SXM, 598s
Included top-3 legal mean: 1.10581327 val_bpb (seeds
2025,13,1313)Results
What Is New Here
This submission is centered on the Late Shared Workspace Adapter (LSWA).
Instead of replacing the transformer trunk, LSWA adds a small shared token-to-workspace-to-token writeback path only in the late decoder. Tokens write into a compact latent workspace, the workspace performs one short refinement step, and the refined workspace state writes back into token states. The same adapter weights are reused across late sites.
The point is to get a new computation pattern with minimal trunk changes.
Submission Architecture
MLP_MULT=2.5to make room for the workspace while staying under the 16MB capWhy This Is Interesting
The main architectural claim is that the workspace idea is carrying real weight without requiring a full model rewrite.
This branch keeps the public March 23 record backbone largely intact and adds one focused shared late adapter. In other words: the novelty is not “a different trunk,” it is “a shared late workspace writeback mechanism that competes under the record-track budget.”
Lineage And Credit
This submission is intentionally a derivative record-track branch with one main new idea.
LeakyReLU² + Legal Score-First TTT + Parallel Muonby @abaybektursunIncluded In This PR
records/track_10min_16mb/2026-03-27_11L_XSA4_LateSharedWorkspaceAdapter_MLP25train_gpt.pywith baked-in record defaultstrain.logplus the other top-3 legal seed logsREADME.mdandsubmission.json