Skip to content

Feature Request: Batch Inference Support in Gradio App #36

@sahandkh1419

Description

@sahandkh1419

I would like to know if it’s possible to add batch inference support to the Gradio app, allowing users to specify a batch size and generate multiple images simultaneously.

Additionally, I’d appreciate guidance on whether this is possible on a 24GB VRAM GPU. Currently, running a single generation with the provided command already occupies ~16GB of memory using the quantized FP8 model:
python app.py --offload --name flux-dev-fp8

Is batch inference technically possible in this setup, and if so, could you provide pointers or help in adding this feature?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions