Small-scale experiments

### Proposed experiment procedure:

1. pick a list of `perturb_step`: [0, 78, 391, 1955, 9775]
2. for each step, do 20 (?TODO) runs of `perturb` with the smallest `perturb_scale` that has non-zero excess loss (i.e. around `1e-12`). Take the 95% (?TODO) percentile of the excess loss as `logreg_threshold`.
3. run `logreg` to find the `perturb_scale` at which excess loss exceeds the `logreg_threshold` 50% of the time.
4. plot `perturb_step` (x) vs `perturb_scale` (y) (?TODO issue: this plot doesn't compare different `logreg_threshold` over time, whereas plotting `perturb_scale` (x) vs excess loss (y) does)
5. TODO plot same but with excess loss mod perm

### List of experiments to run:

Targeting <= 0.15 train CE

Per-layer perturbations:
- [x] conv weights only
- [x] norm weights only: HYPOTHESIS a lot more barrier given same L2/n (n is number of parameters), and less well correlated with L2/n, this shows models are sensitive to activation distributions
- [x] single residual blocks only (conv weights)

Training (original butterfly experiment, but controlling perturbation L2, not perturbing norm layers, etc.):
- [x] standard training `--group=reference --lr=0.001 --warmup_ratio=0.02 --weight_decay=0 --training_steps=25000st`
- [ ] (LMC baseline) different batch orders

Architecture effects, controlling for number of parameters:
- [x] wide/shallow: `--group=arch-wideshallow --model_name=resnet8-64 --warmup_ratio=0.02 --training_steps=25000st`
- [x] narrow/deep: `--group=arch-narrowdeep --model_name=resnet34-16 --warmup_ratio=0.02 --training_steps=40000st`

Hparam effects:
- [x] learning rate: `--group=lr-0.01 --lr=0.01 --training_steps=45000st`
- [x] small batch size: `--group=bs-32 --batch_size=32 --training_steps=85000st`
- [x] large batch size: `--group=bs-512 --batch_size=512 --training_steps=15000st`
- [x] 10x warmup: `--group=warmup-10x --warmup_ratio=0.2 --training_steps=25000st`
- [x] weight decay: `--group=decay-0.0001 --weight_decay=0.0001 --training_steps=20000st`
- [x] constant LR after warmup: `--group=schedule-constant --lr_scheduler=constant --training_steps=35000st`
- [x] adam: `--group=opt-adamw --lr=0.003 --optimizer=adamw --training_steps=20000st`

Finetuning from various partially trained checkpoints (moved to issue #12 ):
- [ ] easy -> hard task: CIFAR-10 to CIFAR-100 (?TODO)
- [ ] hard -> easy task: CIFAR-100 to CIFAR-10 (?TODO)
- [ ] (baseline) randomized data or labels (?TODO)
- [ ] (baseline) totally non-transferable task (?TODO)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Small-scale experiments #14

Proposed experiment procedure:

List of experiments to run:

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Small-scale experiments #14

Description

Proposed experiment procedure:

List of experiments to run:

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions