Packing with VLMs

### Feature request

Per https://github.com/axolotl-ai-cloud/axolotl/issues/3131, would it be possible to implement packing (in SFT) for VLMs? This is currently disabled in SFTrainer by reading the class of the provided processor. 

### Motivation

This would be super helpful for reducing the memory footprint of training VLMs.

### Your contribution

If everything is processed into tokens, shouldn't packing with other modalities be straightforward (ignoring memory issues with the vision encoder)? I'm not knowledgeable in how padding_free packing interacts with flash attention 2. Is this the reason it is not straightforward?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Packing with VLMs #4339

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Packing with VLMs #4339

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions