Combo: tandem curriculum fix + target noise masking#1684
Closed
Combo: tandem curriculum fix + target noise masking#1684
Conversation
|
I have read the CLA Document and I hereby sign the CLA 0 out of 2 committers have signed the CLA. |
Fix tandem curriculum detection (Fourier PE channels -> gap feature index 21) and mask target noise to valid nodes. val/loss=0.8504 vs baseline 0.8326 (+0.018 regression). Combo worse than either fix alone; two fixes interfere rather than synergize, especially on tandem (+1.77 mae_surf_p). W&B run: su3gu9ee
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hypothesis
Two correctness fixes from round 22:
Both improved in_dist and ood_cond individually (tandem fix: in_dist -0.80; noise fix: in_dist -2.1%, ood_cond -1.3%). Both regressed tandem. The tandem regressions may partially cancel: the curriculum fix changes WHEN tandem samples are trained, while the noise fix changes HOW targets are perturbed. With the curriculum fix restoring proper tandem exclusion in early epochs, the noise masking may have a cleaner signal-to-noise ratio during the critical single-foil pre-training phase.
Instructions
Apply BOTH fixes to train.py (tandem curriculum detection fix + target noise masking).
Run with --wandb_group noam-r23-combo-tandem-noise.
Baseline
Results
W&B run: su3gu9ee | Epochs: 61 | Peak VRAM: 18.2 GB
vs baseline (0.8326):
vs individual fixes:
What happened
The combo performed worse than either individual fix (+0.018 vs +0.0006 and +0.013). The two fixes don't synergize — instead they appear to interfere. The tandem curriculum fix correctly excludes tandem in epoch < 10, which changes the gradient landscape during warm-up. The target noise masking changes the noise distribution (zeros for padded nodes). The interaction of these two changes seems to disrupt training more than either does alone, particularly on tandem (+1.77 mae_surf_p).
The tandem regression pattern from target noise masking is amplified in the combo. Possibly: with correct tandem exclusion in early epochs, the model builds a cleaner single-foil representation, then the masked noise (which creates different statistics for tandem vs non-tandem samples) disrupts the tandem transfer phase.
The pre-existing visualization error appeared after training, not affecting metrics.
Suggested follow-ups