diff --git a/10-scaling-up-road-to-the-top-part-3.ipynb b/10-scaling-up-road-to-the-top-part-3.ipynb index f21dbc819..476413cdb 100644 --- a/10-scaling-up-road-to-the-top-part-3.ipynb +++ b/10-scaling-up-road-to-the-top-part-3.ipynb @@ -161,7 +161,7 @@ "id": "0eb24c71", "metadata": {}, "source": [ - "*Gradient accumulation* refers to a very simple trick: rather than updating the model weights after every batch based on that batch's gradients, instead keep *accumulating* (adding up) the gradients for a few batches, and them update the model weights with those accumulated gradients. In fastai, the parameter you pass to `GradientAccumulation` defines how many batches of gradients are accumulated. Since we're adding up the gradients over `accum` batches, we therefore need to divide the batch size by that same number. The resulting training loop is nearly mathematically identical to using the original batch size, but the amount of memory used is the same as using a batch size `accum` times smaller!\n", + "*Gradient accumulation* refers to a very simple trick: rather than updating the model weights after every batch based on that batch's gradients, instead keep *accumulating* (adding up) the gradients for a few batches, and them update the model weights with those accumulated gradients. In fastai, the parameter you pass to `GradientAccumulation` defines for how many input items gradients are accumulated. Since we're adding up the gradients over `accum` batches, we therefore need to divide the batch size by that same number. The resulting training loop is nearly mathematically identical to using the original batch size, but the amount of memory used is the same as using a batch size `accum` times smaller!\n", "\n", "For instance, here's a basic example of a single epoch of a training loop without gradient accumulation:\n", "\n",