Skip to content

Record-track submission: 11L XSA4 + Late Shared Workspace Adapter (LSWA-64x4) + MLP2.5#988

Closed
ymrohit wants to merge 1 commit intoopenai:mainfrom
ymrohit:submission/lswa-record-track-2026-03-27
Closed

Record-track submission: 11L XSA4 + Late Shared Workspace Adapter (LSWA-64x4) + MLP2.5#988
ymrohit wants to merge 1 commit intoopenai:mainfrom
ymrohit:submission/lswa-record-track-2026-03-27

Conversation

@ymrohit
Copy link
Copy Markdown

@ymrohit ymrohit commented Mar 27, 2026

Record-Track Submission: 11L XSA4 + Late Shared Workspace Adapter (LSWA-64x4) + MLP2.5

Best included legal seed: 1.08568610 val_bpb | 15,900,041 bytes | 8xH100 SXM, 598s
Included top-3 legal mean: 1.10581327 val_bpb (seeds 2025, 13, 1313)

Results

Seed Steps final_int6_roundtrip_exact val_bpb Total bytes
2025 7197 1.08568610 15,900,041
13 7212 1.11462396 15,814,869
1313 7200 1.11712974 15,895,409
Mean 1.10581327
Std 0.01747561

What Is New Here

This submission is centered on the Late Shared Workspace Adapter (LSWA).

Instead of replacing the transformer trunk, LSWA adds a small shared token-to-workspace-to-token writeback path only in the late decoder. Tokens write into a compact latent workspace, the workspace performs one short refinement step, and the refined workspace state writes back into token states. The same adapter weights are reused across late sites.

The point is to get a new computation pattern with minimal trunk changes.

Submission Architecture

  • 11-layer, 512d, banked backbone
  • 8 attention heads / 4 KV heads
  • XSA on the last 4 layers
  • Bigram path + VE path retained from the public donor line
  • LSWA-64x4: 64 latent channels, 4 workspace slots, 4 heads, 1 think step
  • Workspace active from decoder block 5 onward
  • MLP_MULT=2.5 to make room for the workspace while staying under the 16MB cap
  • No TTT
  • No EMA / SWA / LAWA
  • Exact post-quant evaluation

Why This Is Interesting

The main architectural claim is that the workspace idea is carrying real weight without requiring a full model rewrite.

This branch keeps the public March 23 record backbone largely intact and adds one focused shared late adapter. In other words: the novelty is not “a different trunk,” it is “a shared late workspace writeback mechanism that competes under the record-track budget.”

Lineage And Credit

This submission is intentionally a derivative record-track branch with one main new idea.

Included In This PR

  • records/track_10min_16mb/2026-03-27_11L_XSA4_LateSharedWorkspaceAdapter_MLP25
  • self-contained train_gpt.py with baked-in record defaults
  • canonical train.log plus the other top-3 legal seed logs
  • README.md and submission.json

@ymrohit ymrohit changed the title Add LSWA record-track submission Record-track submission: 11L XSA4 + Late Shared Workspace Adapter (LSWA-64x4) + MLP2.5 Mar 27, 2026
@ymrohit
Copy link
Copy Markdown
Author

ymrohit commented Mar 27, 2026

Withdrawing this submission. The shared workspace adapter path is non-causal and can leak future-token information within the evaluated sequence, so the reported score is not a valid autoregressive LM benchmark.

@ymrohit ymrohit closed this Mar 27, 2026
@ymrohit ymrohit deleted the submission/lswa-record-track-2026-03-27 branch March 27, 2026 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant