I would like to know if it’s possible to add batch inference support to the Gradio app, allowing users to specify a batch size and generate multiple images simultaneously.
Additionally, I’d appreciate guidance on whether this is possible on a 24GB VRAM GPU. Currently, running a single generation with the provided command already occupies ~16GB of memory using the quantized FP8 model:
python app.py --offload --name flux-dev-fp8
Is batch inference technically possible in this setup, and if so, could you provide pointers or help in adding this feature?