Skip to content

Combo: tandem curriculum fix + target noise masking#1684

Closed
tcapelle wants to merge 2 commits intonoamfrom
noam-r23/combo-tandem-curr-target-noise
Closed

Combo: tandem curriculum fix + target noise masking#1684
tcapelle wants to merge 2 commits intonoamfrom
noam-r23/combo-tandem-curr-target-noise

Conversation

@tcapelle
Copy link
Copy Markdown
Contributor

@tcapelle tcapelle commented Mar 20, 2026

Hypothesis

Two correctness fixes from round 22:

  1. Tandem curriculum fix (val_loss=0.8332, +0.0006): fixes tandem detection that accidentally uses Fourier PE channels instead of gap feature
  2. Target noise masking (val_loss=0.8454): masks target noise to valid nodes only

Both improved in_dist and ood_cond individually (tandem fix: in_dist -0.80; noise fix: in_dist -2.1%, ood_cond -1.3%). Both regressed tandem. The tandem regressions may partially cancel: the curriculum fix changes WHEN tandem samples are trained, while the noise fix changes HOW targets are perturbed. With the curriculum fix restoring proper tandem exclusion in early epochs, the noise masking may have a cleaner signal-to-noise ratio during the critical single-foil pre-training phase.

Instructions

Apply BOTH fixes to train.py (tandem curriculum detection fix + target noise masking).
Run with --wandb_group noam-r23-combo-tandem-noise.

Baseline

  • val_loss = 0.8326
  • in_dist surf_p = 17.94
  • ood_cond surf_p = 13.98
  • ood_re surf_p = 27.54
  • tandem surf_p = 36.73

Results

W&B run: su3gu9ee | Epochs: 61 | Peak VRAM: 18.2 GB

Split val/loss mae_surf_p
in_dist 0.5755 18.0
ood_cond 0.6926 14.1
ood_re 0.5265 27.7
tandem 1.6071 38.5
combined 0.8504

vs baseline (0.8326):

Split Baseline mae_surf_p This run mae_surf_p Delta
in_dist 17.94 18.0 +0.06 (marginal)
ood_cond 13.98 14.1 +0.12 (slightly worse)
ood_re 27.54 27.7 +0.16 (slightly worse)
tandem 36.73 38.5 +1.77 (worse)
val/loss 0.8326 0.8504 +0.018 (worse)

vs individual fixes:

Fix val/loss Delta vs baseline
Tandem curriculum alone 0.8332 +0.0006
Target noise masking alone 0.8454 +0.013
Combo (this run) 0.8504 +0.018

What happened

The combo performed worse than either individual fix (+0.018 vs +0.0006 and +0.013). The two fixes don't synergize — instead they appear to interfere. The tandem curriculum fix correctly excludes tandem in epoch < 10, which changes the gradient landscape during warm-up. The target noise masking changes the noise distribution (zeros for padded nodes). The interaction of these two changes seems to disrupt training more than either does alone, particularly on tandem (+1.77 mae_surf_p).

The tandem regression pattern from target noise masking is amplified in the combo. Possibly: with correct tandem exclusion in early epochs, the model builds a cleaner single-foil representation, then the masked noise (which creates different statistics for tandem vs non-tandem samples) disrupts the tandem transfer phase.

The pre-existing visualization error appeared after training, not affecting metrics.

Suggested follow-ups

  • The tandem curriculum fix is a very small improvement (+0.0006) — may be within noise; not worth combining with other regressions
  • Target noise masking alone consistently hurts; consider removing target noise entirely to see if it helps or hurts
  • The tandem regression across many experiments suggests something fundamental about the training setup; focus may be better placed on understanding why tandem is consistently hard rather than adding complexity

@tcapelle tcapelle added status:wip Student is working on it student:tanjiro Assigned to tanjiro noam Noam advisor branch experiments labels Mar 20, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 20, 2026


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


0 out of 2 committers have signed the CLA.
❌ @senpai-advisor
❌ @senpai-tanjiro
senpai-advisor, senpai-tanjiro seem not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

Fix tandem curriculum detection (Fourier PE channels -> gap feature index 21)
and mask target noise to valid nodes. val/loss=0.8504 vs baseline 0.8326
(+0.018 regression). Combo worse than either fix alone; two fixes interfere
rather than synergize, especially on tandem (+1.77 mae_surf_p).
W&B run: su3gu9ee
@tcapelle tcapelle marked this pull request as ready for review March 20, 2026 15:20
@tcapelle tcapelle added status:review Ready for advisor review and removed status:wip Student is working on it labels Mar 20, 2026
@morganmcg1 morganmcg1 closed this Mar 22, 2026
@github-actions github-actions Bot locked and limited conversation to collaborators Mar 22, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

noam Noam advisor branch experiments status:review Ready for advisor review student:tanjiro Assigned to tanjiro

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants