Record-track submission: 11L XSA4 + Late Shared Workspace Adapter (LSWA-64x4) + MLP2.5 by ymrohit · Pull Request #988 · openai/parameter-golf

ymrohit · 2026-03-27T21:50:09Z

Record-Track Submission: 11L XSA4 + Late Shared Workspace Adapter (LSWA-64x4) + MLP2.5

Best included legal seed: 1.08568610 val_bpb | 15,900,041 bytes | 8xH100 SXM, 598s
Included top-3 legal mean: 1.10581327 val_bpb (seeds 2025, 13, 1313)

Results

Seed	Steps	final_int6_roundtrip_exact val_bpb	Total bytes
2025	7197	1.08568610	15,900,041
13	7212	1.11462396	15,814,869
1313	7200	1.11712974	15,895,409
Mean		1.10581327
Std		0.01747561

What Is New Here

This submission is centered on the Late Shared Workspace Adapter (LSWA).

Instead of replacing the transformer trunk, LSWA adds a small shared token-to-workspace-to-token writeback path only in the late decoder. Tokens write into a compact latent workspace, the workspace performs one short refinement step, and the refined workspace state writes back into token states. The same adapter weights are reused across late sites.

The point is to get a new computation pattern with minimal trunk changes.

Submission Architecture

11-layer, 512d, banked backbone
8 attention heads / 4 KV heads
XSA on the last 4 layers
Bigram path + VE path retained from the public donor line
LSWA-64x4: 64 latent channels, 4 workspace slots, 4 heads, 1 think step
Workspace active from decoder block 5 onward
MLP_MULT=2.5 to make room for the workspace while staying under the 16MB cap
No TTT
No EMA / SWA / LAWA
Exact post-quant evaluation

Why This Is Interesting

The main architectural claim is that the workspace idea is carrying real weight without requiring a full model rewrite.

This branch keeps the public March 23 record backbone largely intact and adds one focused shared late adapter. In other words: the novelty is not “a different trunk,” it is “a shared late workspace writeback mechanism that competes under the record-track budget.”

Lineage And Credit

This submission is intentionally a derivative record-track branch with one main new idea.

Base 11-layer banked trunk: PR #414 by @signalrush
Parameter Banking + Parallel Muon: PR #399 by @abaybektursun
LeakyReLU(0.5)^2 donor activation line: PR #493 by @parinzee and PR #518 by @sofiabod
Public March 23 assembled record line: LeakyReLU² + Legal Score-First TTT + Parallel Muon by @abaybektursun

Included In This PR

records/track_10min_16mb/2026-03-27_11L_XSA4_LateSharedWorkspaceAdapter_MLP25
self-contained train_gpt.py with baked-in record defaults
canonical train.log plus the other top-3 legal seed logs
README.md and submission.json

ymrohit · 2026-03-27T22:42:35Z

Withdrawing this submission. The shared workspace adapter path is non-causal and can leak future-token information within the evaluated sequence, so the reported score is not a valid autoregressive LM benchmark.

Add LSWA record-track submission

22bf6e9

ymrohit changed the title ~~Add LSWA record-track submission~~ Record-track submission: 11L XSA4 + Late Shared Workspace Adapter (LSWA-64x4) + MLP2.5 Mar 27, 2026

ymrohit closed this Mar 27, 2026

ymrohit deleted the submission/lswa-record-track-2026-03-27 branch March 27, 2026 22:43

valerio-oai mentioned this pull request Mar 27, 2026

Illegal submissions megathread #677

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record-track submission: 11L XSA4 + Late Shared Workspace Adapter (LSWA-64x4) + MLP2.5#988

Record-track submission: 11L XSA4 + Late Shared Workspace Adapter (LSWA-64x4) + MLP2.5#988
ymrohit wants to merge 1 commit intoopenai:mainfrom
ymrohit:submission/lswa-record-track-2026-03-27

ymrohit commented Mar 27, 2026 •

edited

Loading

Uh oh!

ymrohit commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ymrohit commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!