Combo: tandem curriculum fix + target noise masking by tcapelle · Pull Request #1684 · wandb/senpai

tcapelle · 2026-03-20T14:47:08Z

Hypothesis

Two correctness fixes from round 22:

Tandem curriculum fix (val_loss=0.8332, +0.0006): fixes tandem detection that accidentally uses Fourier PE channels instead of gap feature
Target noise masking (val_loss=0.8454): masks target noise to valid nodes only

Both improved in_dist and ood_cond individually (tandem fix: in_dist -0.80; noise fix: in_dist -2.1%, ood_cond -1.3%). Both regressed tandem. The tandem regressions may partially cancel: the curriculum fix changes WHEN tandem samples are trained, while the noise fix changes HOW targets are perturbed. With the curriculum fix restoring proper tandem exclusion in early epochs, the noise masking may have a cleaner signal-to-noise ratio during the critical single-foil pre-training phase.

Instructions

Apply BOTH fixes to train.py (tandem curriculum detection fix + target noise masking).
Run with --wandb_group noam-r23-combo-tandem-noise.

Baseline

val_loss = 0.8326
in_dist surf_p = 17.94
ood_cond surf_p = 13.98
ood_re surf_p = 27.54
tandem surf_p = 36.73

Results

W&B run: su3gu9ee | Epochs: 61 | Peak VRAM: 18.2 GB

Split	val/loss	mae_surf_p
in_dist	0.5755	18.0
ood_cond	0.6926	14.1
ood_re	0.5265	27.7
tandem	1.6071	38.5
combined	0.8504

vs baseline (0.8326):

Split	Baseline mae_surf_p	This run mae_surf_p	Delta
in_dist	17.94	18.0	+0.06 (marginal)
ood_cond	13.98	14.1	+0.12 (slightly worse)
ood_re	27.54	27.7	+0.16 (slightly worse)
tandem	36.73	38.5	+1.77 (worse)
val/loss	0.8326	0.8504	+0.018 (worse)

vs individual fixes:

Fix	val/loss	Delta vs baseline
Tandem curriculum alone	0.8332	+0.0006
Target noise masking alone	0.8454	+0.013
Combo (this run)	0.8504	+0.018

What happened

The combo performed worse than either individual fix (+0.018 vs +0.0006 and +0.013). The two fixes don't synergize — instead they appear to interfere. The tandem curriculum fix correctly excludes tandem in epoch < 10, which changes the gradient landscape during warm-up. The target noise masking changes the noise distribution (zeros for padded nodes). The interaction of these two changes seems to disrupt training more than either does alone, particularly on tandem (+1.77 mae_surf_p).

The tandem regression pattern from target noise masking is amplified in the combo. Possibly: with correct tandem exclusion in early epochs, the model builds a cleaner single-foil representation, then the masked noise (which creates different statistics for tandem vs non-tandem samples) disrupts the tandem transfer phase.

The pre-existing visualization error appeared after training, not affecting metrics.

Suggested follow-ups

The tandem curriculum fix is a very small improvement (+0.0006) — may be within noise; not worth combining with other regressions
Target noise masking alone consistently hurts; consider removing target noise entirely to see if it helps or hurts
The tandem regression across many experiments suggests something fundamental about the training setup; focus may be better placed on understanding why tandem is consistently hard rather than adding complexity

github-actions · 2026-03-20T14:47:19Z

Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.

I have read the CLA Document and I hereby sign the CLA

0 out of 2 committers have signed the CLA.
❌ @senpai-advisor
❌ @senpai-tanjiro
senpai-advisor, senpai-tanjiro seem not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You can retrigger this bot by commenting recheck in this Pull Request.}_{Posted by the CLA Assistant Lite bot.}

Fix tandem curriculum detection (Fourier PE channels -> gap feature index 21) and mask target noise to valid nodes. val/loss=0.8504 vs baseline 0.8326 (+0.018 regression). Combo worse than either fix alone; two fixes interfere rather than synergize, especially on tandem (+1.77 mae_surf_p). W&B run: su3gu9ee

Initialize experiment branch

f8030ac

tcapelle added status:wip Student is working on it student:tanjiro Assigned to tanjiro noam Noam advisor branch experiments labels Mar 20, 2026

tcapelle marked this pull request as ready for review March 20, 2026 15:20

tcapelle added status:review Ready for advisor review and removed status:wip Student is working on it labels Mar 20, 2026

morganmcg1 closed this Mar 22, 2026

github-actions Bot locked and limited conversation to collaborators Mar 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combo: tandem curriculum fix + target noise masking#1684

Combo: tandem curriculum fix + target noise masking#1684
tcapelle wants to merge 2 commits intonoamfrom
noam-r23/combo-tandem-curr-target-noise

tcapelle commented Mar 20, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tcapelle commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Hypothesis

Instructions

Baseline

Results

What happened

Suggested follow-ups

Uh oh!

github-actions Bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tcapelle commented Mar 20, 2026 •

edited

Loading

github-actions Bot commented Mar 20, 2026 •

edited

Loading