In the description of tf.keras.optimizers.schedules.CosineDecay it is written: "...taking our learning rate from warmup_target to alpha..." and "...decay will take our learning rate from initial_learning_rate to alpha." This is not correct, since the learning rate is taken to warmup_target * alpha and initial_learning_rate * alpha, respectively. The initial learning rate is multiplied by the value alpha, not taken to alpha.