Skip to content

Small-scale experiments #14

@devinkwok

Description

@devinkwok

Proposed experiment procedure:

  1. pick a list of perturb_step: [0, 78, 391, 1955, 9775]
  2. for each step, do 20 (?TODO) runs of perturb with the smallest perturb_scale that has non-zero excess loss (i.e. around 1e-12). Take the 95% (?TODO) percentile of the excess loss as logreg_threshold.
  3. run logreg to find the perturb_scale at which excess loss exceeds the logreg_threshold 50% of the time.
  4. plot perturb_step (x) vs perturb_scale (y) (?TODO issue: this plot doesn't compare different logreg_threshold over time, whereas plotting perturb_scale (x) vs excess loss (y) does)
  5. TODO plot same but with excess loss mod perm

List of experiments to run:

Targeting <= 0.15 train CE

Per-layer perturbations:

  • conv weights only
  • norm weights only: HYPOTHESIS a lot more barrier given same L2/n (n is number of parameters), and less well correlated with L2/n, this shows models are sensitive to activation distributions
  • single residual blocks only (conv weights)

Training (original butterfly experiment, but controlling perturbation L2, not perturbing norm layers, etc.):

  • standard training --group=reference --lr=0.001 --warmup_ratio=0.02 --weight_decay=0 --training_steps=25000st
  • (LMC baseline) different batch orders

Architecture effects, controlling for number of parameters:

  • wide/shallow: --group=arch-wideshallow --model_name=resnet8-64 --warmup_ratio=0.02 --training_steps=25000st
  • narrow/deep: --group=arch-narrowdeep --model_name=resnet34-16 --warmup_ratio=0.02 --training_steps=40000st

Hparam effects:

  • learning rate: --group=lr-0.01 --lr=0.01 --training_steps=45000st
  • small batch size: --group=bs-32 --batch_size=32 --training_steps=85000st
  • large batch size: --group=bs-512 --batch_size=512 --training_steps=15000st
  • 10x warmup: --group=warmup-10x --warmup_ratio=0.2 --training_steps=25000st
  • weight decay: --group=decay-0.0001 --weight_decay=0.0001 --training_steps=20000st
  • constant LR after warmup: --group=schedule-constant --lr_scheduler=constant --training_steps=35000st
  • adam: --group=opt-adamw --lr=0.003 --optimizer=adamw --training_steps=20000st

Finetuning from various partially trained checkpoints (moved to issue #12 ):

  • easy -> hard task: CIFAR-10 to CIFAR-100 (?TODO)
  • hard -> easy task: CIFAR-100 to CIFAR-10 (?TODO)
  • (baseline) randomized data or labels (?TODO)
  • (baseline) totally non-transferable task (?TODO)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions