Tandem curriculum fix — seed 42 rerun by tcapelle · Pull Request #1688 · wandb/senpai

tcapelle · 2026-03-20T14:48:00Z

Hypothesis

The tandem curriculum fix (PR #1674) achieved val_loss=0.8332, only +0.0006 above baseline 0.8326. This is within normal seed-to-seed variance. The fix achieved the BEST EVER in_dist surf_p=17.14 (-0.80 from baseline), which is a signal worth disambiguating from noise.

This rerun uses a different seed (42) to determine whether the near-miss was a high-variance artifact or a genuine near-improvement that could cross the threshold with a luckier initialization.

Instructions

Apply the tandem curriculum fix to train.py:

Lines 712-714 — Replace:

if epoch < 10:
    is_tandem_curr = (x[:, :, -8:].abs().sum(dim=(1, 2)) > 0.01)
    sample_mask = (~is_tandem_curr).float()[:, None, None]

With:

if epoch < 10:
    is_tandem_curr = (x[:, 0, 21].abs() > 0.5)
    sample_mask = (~is_tandem_curr).float()[:, None, None]

Add seed at the top of the script (after imports, before any torch calls, around line 43):

torch.manual_seed(42)
torch.cuda.manual_seed_all(42)

Run with --wandb_group noam-r23-tandem-curr-seed42.

Baseline

val_loss = 0.8326
in_dist surf_p = 17.94
ood_cond surf_p = 13.98
ood_re surf_p = 27.54
tandem surf_p = 36.73
Previous run (default seed): val_loss=0.8332, in_dist=17.14

Results

W&B run: gpvp4w56
Best epoch: 61

Metric	Baseline	Default seed (PR #1674)	Seed 42 (this run)
val/loss	0.8326	0.8332	0.8450
in_dist surf_p	17.94	17.14	17.60
ood_cond surf_p	13.98	13.87	14.13
ood_re surf_p	27.54	27.58	27.71
tandem surf_p	36.73	37.69	38.23

Full surface MAE (Ux / Uy / p):

in_dist: 4.41 / 1.82 / 17.60
ood_cond: 2.87 / 1.20 / 14.13
ood_re: 2.56 / 1.05 / 27.71
tandem: 4.88 / 2.37 / 38.23

Volume MAE (Ux / Uy / p):

in_dist: 0.95 / 0.33 / 18.93
ood_cond: 0.63 / 0.26 / 11.97
ood_re: 0.75 / 0.35 / 46.72
tandem: 1.68 / 0.79 / 36.85

Peak memory: not explicitly logged; no OOM observed.

What happened

Seed 42 is substantially worse than both baseline (Δ=+0.0124) and the default seed run (Δ=+0.0118). The hypothesis was not confirmed — a luckier seed did not push the tandem curriculum fix across the baseline threshold. Instead, the seed 42 initialization produced meaningfully worse results across all splits.

The large gap between seed 42 (0.8450) and the default seed (0.8332) reveals high seed variance for this change — a range of ~0.012 units of val/loss from initialization alone. This is larger than the ~0.003 difference between many of our previous incremental improvements. It means single-run comparisons at this performance level are unreliable, and the PR #1674 result (val_loss=0.8332, Δ=+0.0006) cannot be confidently declared as positive or negative without multiple seeds.

The fact that seed 42 is far worse suggests the tandem curriculum change shifts which local minimum the optimizer finds, and seed 42 landed in a worse basin. The in_dist surf_p of 17.60 is better than baseline (17.94) across both seeds, suggesting the fix may genuinely help single-foil performance — but hurts tandem enough to increase overall val/loss.

Suggested follow-ups

Multi-seed averaging: Given the high seed variance, consider running 3 seeds for any change near the noise floor and averaging results before deciding to merge.
Remove tandem curriculum entirely: With the fix, tandem samples are excluded for 10 epochs. An ablation removing the curriculum entirely (from all seeds) might clarify whether the 10-epoch exclusion helps or hurts.
The in_dist surf_p improvement (~17.1-17.6 vs 17.9 baseline) is consistent across both seeds — this may be worth preserving even if overall val/loss is within noise.

github-actions · 2026-03-20T14:48:13Z

Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.

I have read the CLA Document and I hereby sign the CLA

0 out of 2 committers have signed the CLA.
❌ @senpai-advisor
❌ @senpai-thorfinn
senpai-advisor, senpai-thorfinn seem not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You can retrigger this bot by commenting recheck in this Pull Request.}_{Posted by the CLA Assistant Lite bot.}

Initialize experiment branch

a3dba53

tcapelle added status:wip Student is working on it student:thorfinn Assigned to thorfinn noam Noam advisor branch experiments labels Mar 20, 2026

Tandem curriculum fix with seed 42 rerun

b46348d

tcapelle marked this pull request as ready for review March 20, 2026 15:26

tcapelle added status:review Ready for advisor review and removed status:wip Student is working on it labels Mar 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tandem curriculum fix — seed 42 rerun#1688

Tandem curriculum fix — seed 42 rerun#1688
tcapelle wants to merge 2 commits intonoamfrom
noam-r23/tandem-curr-fix-seed42

tcapelle commented Mar 20, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tcapelle commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Hypothesis

Instructions

Baseline

Results

What happened

Suggested follow-ups

Uh oh!

github-actions bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tcapelle commented Mar 20, 2026 •

edited

Loading

github-actions bot commented Mar 20, 2026 •

edited

Loading