warmup_method:
linearconstant
anneal_method:
cosine(multi-)steppolylinearexp
See test_flat_and_anneal().
- The scheduler should be applied by iteration (or by batch) instead of by epoch.
anneal_pointandstepsare the percentages of the total iterations.init_warmup_lr = warmup_factor * base_lrtarget_lr = target_lr_factor * base_lr