Skip to content

PROTEUS v9 — 11L INT6 + single-epoch LoRA TTT (mean val_bpb=1.1526, 3 seeds)#633

Open
MatoTeziTanka wants to merge 1 commit intoopenai:mainfrom
MatoTeziTanka:proteus-v9-1ep-ttt
Open

PROTEUS v9 — 11L INT6 + single-epoch LoRA TTT (mean val_bpb=1.1526, 3 seeds)#633
MatoTeziTanka wants to merge 1 commit intoopenai:mainfrom
MatoTeziTanka:proteus-v9-1ep-ttt

Conversation

@MatoTeziTanka
Copy link

Result

Mean val_bpb: 1.1526 (3 seeds, std: 0.0004)

Seed Post-Quant BPB TTT BPB (1 epoch)
42 1.1804 1.1527
1337 1.1749 1.1529
2024 1.1771 1.1522

TTT Legality — Single Epoch, Score-Then-Train

This submission addresses the ruling on PR #568 (comment) where @valerio-oai correctly identified multi-epoch TTT as training on eval data.

This submission uses TTT_EPOCHS=1. Each token is scored exactly once, before being trained on:

  1. Forward pass on chunk → score (accumulate loss for BPB)
  2. Train LoRA on that chunk (backward-looking — tokens already scored)
  3. Advance to next chunk

No token is ever scored after being trained on. This is the same score-then-train pattern as PR #77 (merged), applied once per document.

What changed from v7/v8: TTT_EPOCHS reduced from 2-5 to 1. TTT_MIN_DOC_LEN set to 512. No other code changes.

Architecture

11 transformer layers, 512d, 8/4 heads (GQA), 26.8M params. INT6 GPTQ-lite quantization, zstd-22 compression. Artifact ~15.4 MB (96.3% of 16MB budget).

Full details in README.md.

Platform

RunPod 8×H100 SXM, PyTorch 2.8.0+cu128.

… seeds)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@MatoTeziTanka
Copy link
Author

Adding nats (val_loss) for clarity:

Seed val_loss (nats) val_bpb
42 1.9463 1.1527
1337 1.9466 1.1529
2024 1.9454 1.1522
Mean 1.9461 1.1526

Platform: RunPod 8×H100 SXM, PyTorch 2.8.0+cu128. Docker: matotezitanka/proteus-pytorch:2.11.0-cuda12.8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant