Skip to content

Packing with VLMs #4339

@jiosephlee

Description

@jiosephlee

Feature request

Per axolotl-ai-cloud/axolotl#3131, would it be possible to implement packing (in SFT) for VLMs? This is currently disabled in SFTrainer by reading the class of the provided processor.

Motivation

This would be super helpful for reducing the memory footprint of training VLMs.

Your contribution

If everything is processed into tokens, shouldn't packing with other modalities be straightforward (ignoring memory issues with the vision encoder)? I'm not knowledgeable in how padding_free packing interacts with flash attention 2. Is this the reason it is not straightforward?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions