Turns out DifferentiationInterface.prepare_gradient requires the batch sizes to be exactly equal. Currently, if the number of samples does not exactly divide by the batch size, the last batch might end up having a different size. This needs to be fixed if we want to use prepare_gradient in a subsampling context.