generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Open
Labels
Description
Feature request
Per axolotl-ai-cloud/axolotl#3131, would it be possible to implement packing (in SFT) for VLMs? This is currently disabled in SFTrainer by reading the class of the provided processor.
Motivation
This would be super helpful for reducing the memory footprint of training VLMs.
Your contribution
If everything is processed into tokens, shouldn't packing with other modalities be straightforward (ignoring memory issues with the vision encoder)? I'm not knowledgeable in how padding_free packing interacts with flash attention 2. Is this the reason it is not straightforward?