Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions records/track_10min_16mb/2026-03-22_AtrisLabs_v8/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Atris Labs — 10L MLP3x + Int5/Int6 + BigramHash + SmearGate + SWA

## Approach

Stacked 8 independently validated techniques matching the current leaderboard winners:

### Architecture (25.5M params)
- **10 transformer layers** with U-Net skip connections
- **MLP 3x** expansion (1536 hidden, relu-squared)
- **BigramHash(10240)**: Hash consecutive token pairs into 10240-bucket embedding table (dim=128), zero-init with learnable scale (0.05)
- **SmearGate**: Per-dimension learned gate blending each token with previous token embedding

### Training
- **Muon optimizer**: matrix_lr=0.02, momentum=0.99 (warmup 0.92→0.99 over 1500 steps), weight decay=0.04
- **AdamW**: tied_embed_lr=0.03, scalar_lr=0.02, weight decay=0.01
- **Sequence length**: 2048 tokens, batch 786,432 tokens/step
- **Gradient clipping**: norm=0.3
- **SWA**: Average 24 checkpoints during warmdown (when LR scale < 0.4)
- **Warmdown**: 3000 iterations

### Quantization & Compression
- **Int5 MLP weights** (32 levels, per-row scale) — compresses ~1.88x under zstd
- **Int6 attention weights** (64 levels, per-row scale) — compresses ~1.51x
- **FP16 passthrough** for tied embeddings
- **3% magnitude pruning** before quantization
- **zstd-22** compression (or zlib fallback)

### Evaluation
- **Reported score path:** standard final eval with `EVAL_SEQ_LEN=2048`
- **Sliding-window code path:** included in `train_gpt.py`, but not used for the reported metrics in this folder

## Key Metrics (audited seed=42 run)

- **val_bpb (int8+zlib roundtrip exact):** 1.18069496
- **val_loss:** 1.99355398
- **Artifact size:** 14,461,499 bytes (under 16MB)
- **Training steps:** 6428 in 600.039s on 8xH100 (93.35ms/step)
- **Peak memory:** 18,974 MiB
- **SWA:** 24 checkpoints averaged during warmdown
- **Train log:** included as `train.log`

## Command

```bash
NCCL_IB_DISABLE=1 \
RUN_ID=atris_v8_submission \
VAL_LOSS_EVERY=0 \
TRAIN_LOG_EVERY=50 \
WARMUP_STEPS=5 \
MAX_WALLCLOCK_SECONDS=600 \
torchrun --standalone --nproc_per_node=8 train_gpt.py
```

All other hyperparameters use defaults from `train_gpt.py`.
11 changes: 11 additions & 0 deletions records/track_10min_16mb/2026-03-22_AtrisLabs_v8/submission.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"author": "Atris Labs",
"github_id": "keshav55",
"name": "10L MLP3x Int5/Int6 + BigramHash + SmearGate + SWA",
"blurb": "Audited seed42 run. 25.5M param model with Int5 MLP + Int6 attn, BigramHash(10240), SmearGate, SWA(24 ckpts), WD=0.04, grad_clip=0.3, 3% pruning, seq_len=2048, 8xH100.",
"date": "2026-03-24T08:24:52Z",
"val_loss": 1.99355398,
"val_bpb": 1.18069496,
"bytes_total": 14461499,
"bytes_code": 65264
}
168 changes: 168 additions & 0 deletions records/track_10min_16mb/2026-03-22_AtrisLabs_v8/train.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
logs/atris_v8_audit_seed42.txt
val_bpb:enabled tokenizer_kind=sentencepiece tokenizer_path=./data/tokenizers/fineweb_1024_bpe.model
train_loader:dataset:fineweb10B_sp1024 train_shards:10
val_loader:shards pattern=./data/datasets/fineweb10B_sp1024/fineweb_val_*.bin tokens:62021632
model_params:25517137
world_size:8 grad_accum_steps:1
sdp_backends:cudnn=False flash=True mem_efficient=False math=False
attention_mode:gqa num_heads:8 num_kv_heads:4
tie_embeddings:True embed_lr:0.03 head_lr:0.0 matrix_lr:0.02 scalar_lr:0.02
train_batch_tokens:786432 train_seq_len:2048 iterations:20000 warmup_steps:5 max_wallclock_seconds:600.000
seed:42
v1:num_layers:10 int6_layers:[3,7)
warmup_step:1/5
warmup_step:2/5
warmup_step:3/5
warmup_step:4/5
warmup_step:5/5
step:1/20000 train_loss:6.9334 train_time:273ms step_avg:273.50ms
step:2/20000 train_loss:8.1909 train_time:345ms step_avg:172.69ms
step:3/20000 train_loss:7.6563 train_time:442ms step_avg:147.20ms
step:4/20000 train_loss:6.8750 train_time:533ms step_avg:133.24ms
step:5/20000 train_loss:6.9017 train_time:625ms step_avg:124.97ms
step:6/20000 train_loss:6.8540 train_time:718ms step_avg:119.65ms
step:7/20000 train_loss:6.6549 train_time:808ms step_avg:115.49ms
step:8/20000 train_loss:6.6290 train_time:901ms step_avg:112.58ms
step:9/20000 train_loss:6.3737 train_time:992ms step_avg:110.28ms
step:10/20000 train_loss:6.0944 train_time:1084ms step_avg:108.44ms
step:50/20000 train_loss:3.8339 train_time:4748ms step_avg:94.96ms
step:100/20000 train_loss:3.1935 train_time:9327ms step_avg:93.27ms
step:150/20000 train_loss:2.9061 train_time:15008ms step_avg:100.05ms
step:200/20000 train_loss:2.4086 train_time:19575ms step_avg:97.87ms
step:250/20000 train_loss:2.5012 train_time:24165ms step_avg:96.66ms
step:300/20000 train_loss:2.5914 train_time:29300ms step_avg:97.67ms
step:350/20000 train_loss:2.5724 train_time:33885ms step_avg:96.81ms
step:400/20000 train_loss:2.4459 train_time:40431ms step_avg:101.08ms
step:450/20000 train_loss:2.3974 train_time:44999ms step_avg:100.00ms
step:500/20000 train_loss:2.4264 train_time:49587ms step_avg:99.17ms
step:550/20000 train_loss:2.3683 train_time:54704ms step_avg:99.46ms
step:600/20000 train_loss:2.3523 train_time:59291ms step_avg:98.82ms
step:650/20000 train_loss:2.3468 train_time:64385ms step_avg:99.05ms
step:700/20000 train_loss:2.3598 train_time:68973ms step_avg:98.53ms
step:750/20000 train_loss:2.3373 train_time:73724ms step_avg:98.30ms
step:800/20000 train_loss:2.2443 train_time:78958ms step_avg:98.70ms
step:850/20000 train_loss:2.2372 train_time:83526ms step_avg:98.27ms
step:900/20000 train_loss:2.1322 train_time:88501ms step_avg:98.33ms
step:950/20000 train_loss:2.2191 train_time:93087ms step_avg:97.99ms
step:1000/20000 train_loss:2.2727 train_time:97669ms step_avg:97.67ms
step:1050/20000 train_loss:2.2240 train_time:102687ms step_avg:97.80ms
step:1100/20000 train_loss:2.3209 train_time:107252ms step_avg:97.50ms
step:1150/20000 train_loss:2.2443 train_time:113722ms step_avg:98.89ms
step:1200/20000 train_loss:2.3511 train_time:118305ms step_avg:98.59ms
step:1250/20000 train_loss:2.2412 train_time:122871ms step_avg:98.30ms
step:1300/20000 train_loss:2.3554 train_time:127539ms step_avg:98.11ms
step:1350/20000 train_loss:2.1615 train_time:132116ms step_avg:97.86ms
step:1400/20000 train_loss:2.2085 train_time:136766ms step_avg:97.69ms
step:1450/20000 train_loss:2.1886 train_time:141353ms step_avg:97.48ms
step:1500/20000 train_loss:2.1808 train_time:145936ms step_avg:97.29ms
step:1550/20000 train_loss:2.1780 train_time:150615ms step_avg:97.17ms
step:1600/20000 train_loss:2.1859 train_time:155200ms step_avg:97.00ms
step:1650/20000 train_loss:1.9866 train_time:159760ms step_avg:96.82ms
step:1700/20000 train_loss:2.1853 train_time:164431ms step_avg:96.72ms
step:1750/20000 train_loss:2.1218 train_time:169010ms step_avg:96.58ms
step:1800/20000 train_loss:2.1355 train_time:173667ms step_avg:96.48ms
step:1850/20000 train_loss:2.1585 train_time:178239ms step_avg:96.35ms
step:1900/20000 train_loss:2.1987 train_time:182810ms step_avg:96.22ms
step:1950/20000 train_loss:2.1352 train_time:187462ms step_avg:96.13ms
step:2000/20000 train_loss:2.1734 train_time:192035ms step_avg:96.02ms
step:2050/20000 train_loss:2.0977 train_time:196683ms step_avg:95.94ms
step:2100/20000 train_loss:2.0760 train_time:201244ms step_avg:95.83ms
step:2150/20000 train_loss:2.0499 train_time:205812ms step_avg:95.73ms
step:2200/20000 train_loss:2.1803 train_time:210455ms step_avg:95.66ms
step:2250/20000 train_loss:2.1192 train_time:215016ms step_avg:95.56ms
step:2300/20000 train_loss:2.1180 train_time:219670ms step_avg:95.51ms
step:2350/20000 train_loss:2.1528 train_time:224246ms step_avg:95.42ms
step:2400/20000 train_loss:2.1788 train_time:228807ms step_avg:95.34ms
step:2450/20000 train_loss:2.1676 train_time:233459ms step_avg:95.29ms
step:2500/20000 train_loss:2.0742 train_time:238029ms step_avg:95.21ms
step:2550/20000 train_loss:2.1656 train_time:242752ms step_avg:95.20ms
step:2600/20000 train_loss:2.1594 train_time:247319ms step_avg:95.12ms
step:2650/20000 train_loss:2.0781 train_time:251878ms step_avg:95.05ms
step:2700/20000 train_loss:2.1168 train_time:256526ms step_avg:95.01ms
step:2750/20000 train_loss:2.1581 train_time:261090ms step_avg:94.94ms
step:2800/20000 train_loss:2.1330 train_time:265739ms step_avg:94.91ms
step:2850/20000 train_loss:2.1351 train_time:270313ms step_avg:94.85ms
step:2900/20000 train_loss:2.0638 train_time:274874ms step_avg:94.78ms
step:2950/20000 train_loss:2.1463 train_time:279522ms step_avg:94.75ms
step:3000/20000 train_loss:2.1591 train_time:284082ms step_avg:94.69ms
step:3050/20000 train_loss:2.0864 train_time:288643ms step_avg:94.64ms
step:3100/20000 train_loss:2.0954 train_time:293297ms step_avg:94.61ms
step:3150/20000 train_loss:2.1326 train_time:297856ms step_avg:94.56ms
step:3200/20000 train_loss:1.9112 train_time:302494ms step_avg:94.53ms
step:3250/20000 train_loss:2.1164 train_time:307060ms step_avg:94.48ms
step:3300/20000 train_loss:2.0477 train_time:311646ms step_avg:94.44ms
step:3350/20000 train_loss:2.0481 train_time:316293ms step_avg:94.42ms
step:3400/20000 train_loss:2.1343 train_time:320866ms step_avg:94.37ms
step:3450/20000 train_loss:2.1241 train_time:325512ms step_avg:94.35ms
step:3500/20000 train_loss:2.0931 train_time:330076ms step_avg:94.31ms
step:3550/20000 train_loss:2.0974 train_time:334633ms step_avg:94.26ms
step:3600/20000 train_loss:2.0907 train_time:339275ms step_avg:94.24ms
step:3650/20000 train_loss:2.0268 train_time:343845ms step_avg:94.20ms
step:3700/20000 train_loss:2.0418 train_time:348476ms step_avg:94.18ms
step:3750/20000 train_loss:2.1506 train_time:353036ms step_avg:94.14ms
step:3800/20000 train_loss:2.0752 train_time:357603ms step_avg:94.11ms
step:3850/20000 train_loss:2.1308 train_time:362252ms step_avg:94.09ms
step:3900/20000 train_loss:2.0688 train_time:366813ms step_avg:94.05ms
step:3950/20000 train_loss:2.0555 train_time:371462ms step_avg:94.04ms
step:4000/20000 train_loss:2.0031 train_time:376036ms step_avg:94.01ms
step:4050/20000 train_loss:2.1147 train_time:380599ms step_avg:93.98ms
step:4100/20000 train_loss:1.9607 train_time:385242ms step_avg:93.96ms
step:4150/20000 train_loss:2.1218 train_time:389800ms step_avg:93.93ms
step:4200/20000 train_loss:2.0984 train_time:394441ms step_avg:93.91ms
step:4250/20000 train_loss:2.0865 train_time:399010ms step_avg:93.88ms
step:4300/20000 train_loss:2.0624 train_time:403580ms step_avg:93.86ms
step:4350/20000 train_loss:1.9776 train_time:408229ms step_avg:93.85ms
step:4400/20000 train_loss:2.1091 train_time:412785ms step_avg:93.81ms
step:4450/20000 train_loss:1.9889 train_time:417353ms step_avg:93.79ms
step:4500/20000 train_loss:2.0520 train_time:422000ms step_avg:93.78ms
step:4550/20000 train_loss:1.9593 train_time:426573ms step_avg:93.75ms
step:4600/20000 train_loss:1.9923 train_time:431210ms step_avg:93.74ms
step:4650/20000 train_loss:2.0985 train_time:435784ms step_avg:93.72ms
step:4700/20000 train_loss:2.0307 train_time:440353ms step_avg:93.69ms
step:4750/20000 train_loss:2.0463 train_time:445001ms step_avg:93.68ms
step:4800/20000 train_loss:2.0601 train_time:449569ms step_avg:93.66ms
step:4850/20000 train_loss:2.0818 train_time:454220ms step_avg:93.65ms
step:4900/20000 train_loss:2.0382 train_time:458795ms step_avg:93.63ms
step:4950/20000 train_loss:1.9963 train_time:463356ms step_avg:93.61ms
step:5000/20000 train_loss:2.0558 train_time:467976ms step_avg:93.60ms
step:5050/20000 train_loss:1.9657 train_time:472542ms step_avg:93.57ms
step:5100/20000 train_loss:2.0261 train_time:477207ms step_avg:93.57ms
step:5150/20000 train_loss:2.0299 train_time:481782ms step_avg:93.55ms
step:5200/20000 train_loss:2.0474 train_time:486356ms step_avg:93.53ms
step:5250/20000 train_loss:1.9632 train_time:490998ms step_avg:93.52ms
step:5300/20000 train_loss:1.9431 train_time:495674ms step_avg:93.52ms
step:5350/20000 train_loss:2.1841 train_time:500354ms step_avg:93.52ms
step:5400/20000 train_loss:2.0361 train_time:504934ms step_avg:93.51ms
step:5450/20000 train_loss:2.1996 train_time:509528ms step_avg:93.49ms
step:5500/20000 train_loss:2.0501 train_time:514205ms step_avg:93.49ms
step:5550/20000 train_loss:2.0461 train_time:518806ms step_avg:93.48ms
step:5600/20000 train_loss:1.9633 train_time:523484ms step_avg:93.48ms
step:5650/20000 train_loss:2.1382 train_time:528067ms step_avg:93.46ms
step:5700/20000 train_loss:1.8887 train_time:532671ms step_avg:93.45ms
step:5750/20000 train_loss:1.9986 train_time:537340ms step_avg:93.45ms
step:5800/20000 train_loss:1.8035 train_time:541927ms step_avg:93.44ms
step:5850/20000 train_loss:1.9441 train_time:546606ms step_avg:93.44ms
step:5900/20000 train_loss:1.9655 train_time:551215ms step_avg:93.43ms
step:5950/20000 train_loss:1.8819 train_time:555793ms step_avg:93.41ms
step:6000/20000 train_loss:1.9213 train_time:560476ms step_avg:93.41ms
step:6050/20000 train_loss:1.9118 train_time:565058ms step_avg:93.40ms
step:6100/20000 train_loss:1.9396 train_time:569648ms step_avg:93.38ms
step:6150/20000 train_loss:1.9769 train_time:574320ms step_avg:93.39ms
step:6200/20000 train_loss:2.1091 train_time:578908ms step_avg:93.37ms
step:6250/20000 train_loss:1.9760 train_time:583589ms step_avg:93.37ms
step:6300/20000 train_loss:1.8647 train_time:588168ms step_avg:93.36ms
step:6350/20000 train_loss:1.9435 train_time:592757ms step_avg:93.35ms
step:6400/20000 train_loss:1.9207 train_time:597435ms step_avg:93.35ms
step:6428/20000 val_loss:1.9661 val_bpb:1.1644 train_time:600039ms step_avg:93.35ms
stopping_early: wallclock_cap train_time:600039ms step:6428/20000
peak memory allocated: 18974 MiB reserved: 19176 MiB
swa:applying averaged 24 checkpoints
Serialized model: 98437483 bytes
Code size: 65264 bytes
Total submission size: 98502747 bytes
pruning:zeroed smallest 3.0% of large matrix weights
Serialized model int8+zlib: 14396235 bytes (payload:26268994 raw_torch:26320625 payload_ratio:3.75x)
v1:int6_tensors:41
Total submission size int8+zlib: 14461499 bytes
final_int8_zlib_roundtrip val_loss:1.9936 val_bpb:1.1807 eval_time:1903ms
final_int8_zlib_roundtrip_exact val_loss:1.99355398 val_bpb:1.18069496
Loading