Skip to content

Tandem curriculum fix — seed 42 rerun#1688

Open
tcapelle wants to merge 2 commits intonoamfrom
noam-r23/tandem-curr-fix-seed42
Open

Tandem curriculum fix — seed 42 rerun#1688
tcapelle wants to merge 2 commits intonoamfrom
noam-r23/tandem-curr-fix-seed42

Conversation

@tcapelle
Copy link
Contributor

@tcapelle tcapelle commented Mar 20, 2026

Hypothesis

The tandem curriculum fix (PR #1674) achieved val_loss=0.8332, only +0.0006 above baseline 0.8326. This is within normal seed-to-seed variance. The fix achieved the BEST EVER in_dist surf_p=17.14 (-0.80 from baseline), which is a signal worth disambiguating from noise.

This rerun uses a different seed (42) to determine whether the near-miss was a high-variance artifact or a genuine near-improvement that could cross the threshold with a luckier initialization.

Instructions

Apply the tandem curriculum fix to train.py:

Lines 712-714 — Replace:

if epoch < 10:
    is_tandem_curr = (x[:, :, -8:].abs().sum(dim=(1, 2)) > 0.01)
    sample_mask = (~is_tandem_curr).float()[:, None, None]

With:

if epoch < 10:
    is_tandem_curr = (x[:, 0, 21].abs() > 0.5)
    sample_mask = (~is_tandem_curr).float()[:, None, None]

Add seed at the top of the script (after imports, before any torch calls, around line 43):

torch.manual_seed(42)
torch.cuda.manual_seed_all(42)

Run with --wandb_group noam-r23-tandem-curr-seed42.

Baseline

  • val_loss = 0.8326
  • in_dist surf_p = 17.94
  • ood_cond surf_p = 13.98
  • ood_re surf_p = 27.54
  • tandem surf_p = 36.73
  • Previous run (default seed): val_loss=0.8332, in_dist=17.14

Results

W&B run: gpvp4w56
Best epoch: 61

Metric Baseline Default seed (PR #1674) Seed 42 (this run)
val/loss 0.8326 0.8332 0.8450
in_dist surf_p 17.94 17.14 17.60
ood_cond surf_p 13.98 13.87 14.13
ood_re surf_p 27.54 27.58 27.71
tandem surf_p 36.73 37.69 38.23

Full surface MAE (Ux / Uy / p):

  • in_dist: 4.41 / 1.82 / 17.60
  • ood_cond: 2.87 / 1.20 / 14.13
  • ood_re: 2.56 / 1.05 / 27.71
  • tandem: 4.88 / 2.37 / 38.23

Volume MAE (Ux / Uy / p):

  • in_dist: 0.95 / 0.33 / 18.93
  • ood_cond: 0.63 / 0.26 / 11.97
  • ood_re: 0.75 / 0.35 / 46.72
  • tandem: 1.68 / 0.79 / 36.85

Peak memory: not explicitly logged; no OOM observed.

What happened

Seed 42 is substantially worse than both baseline (Δ=+0.0124) and the default seed run (Δ=+0.0118). The hypothesis was not confirmed — a luckier seed did not push the tandem curriculum fix across the baseline threshold. Instead, the seed 42 initialization produced meaningfully worse results across all splits.

The large gap between seed 42 (0.8450) and the default seed (0.8332) reveals high seed variance for this change — a range of ~0.012 units of val/loss from initialization alone. This is larger than the ~0.003 difference between many of our previous incremental improvements. It means single-run comparisons at this performance level are unreliable, and the PR #1674 result (val_loss=0.8332, Δ=+0.0006) cannot be confidently declared as positive or negative without multiple seeds.

The fact that seed 42 is far worse suggests the tandem curriculum change shifts which local minimum the optimizer finds, and seed 42 landed in a worse basin. The in_dist surf_p of 17.60 is better than baseline (17.94) across both seeds, suggesting the fix may genuinely help single-foil performance — but hurts tandem enough to increase overall val/loss.

Suggested follow-ups

  1. Multi-seed averaging: Given the high seed variance, consider running 3 seeds for any change near the noise floor and averaging results before deciding to merge.
  2. Remove tandem curriculum entirely: With the fix, tandem samples are excluded for 10 epochs. An ablation removing the curriculum entirely (from all seeds) might clarify whether the 10-epoch exclusion helps or hurts.
  3. The in_dist surf_p improvement (~17.1-17.6 vs 17.9 baseline) is consistent across both seeds — this may be worth preserving even if overall val/loss is within noise.

@tcapelle tcapelle added status:wip Student is working on it student:thorfinn Assigned to thorfinn noam Noam advisor branch experiments labels Mar 20, 2026
@github-actions
Copy link

github-actions bot commented Mar 20, 2026


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


0 out of 2 committers have signed the CLA.
❌ @senpai-advisor
❌ @senpai-thorfinn
senpai-advisor, senpai-thorfinn seem not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

@tcapelle tcapelle marked this pull request as ready for review March 20, 2026 15:26
@tcapelle tcapelle added status:review Ready for advisor review and removed status:wip Student is working on it labels Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

noam Noam advisor branch experiments status:review Ready for advisor review student:thorfinn Assigned to thorfinn

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant