Feat(UI): Add LLM-powered prompt expansion and image-to-prompt features#8899
Feat(UI): Add LLM-powered prompt expansion and image-to-prompt features#8899Pfannkuchensack wants to merge 10 commits intoinvoke-ai:mainfrom
Conversation
Adds two new buttons to the positive prompt area: - "Expand Prompt" uses a local TextLLM model (AutoModelForCausalLM) to expand brief prompts into detailed image generation prompts - "Image to Prompt" uses an existing LLaVA OneVision model to generate descriptive prompts from uploaded images Backend: new TextLLM model type with config, loader, pipeline wrapper, workflow node, and two new API endpoints (expand-prompt, image-to-prompt). Also fixes HuggingFace metadata fetch assertion error when file size is None. Frontend: ExpandPromptButton and ImageToPromptButton components with model picker popovers, RTK Query mutations, and model type hooks. Buttons only appear when compatible models are installed.
JPPhoto
left a comment
There was a problem hiding this comment.
- Use model execution device for text LLM inference - /mnt/AI/InvokeAI3/src/invokeai/app/invocations/text_llm.py:56-61
When a text LLM is configured with cpu_only=True (or otherwise cached on CPU), model_on_device() keeps the model on CPU, but the pipeline inputs are moved to TorchDevice.choose_torch_device() which selects GPU when available. That device mismatch triggers runtime errors like "Expected all tensors to be on the same device" during generation. Consider deriving the device from the loaded model (e.g., next(model.parameters()).device) or the cache's execution device instead of a global device chooser.
- Utilities LLM endpoints ignore cpu_only device selection - /mnt/AI/InvokeAI3/src/invokeai/app/api/routers/utilities.py:93-99
Both _run_expand_prompt and _run_image_to_prompt move inputs to TorchDevice.choose_torch_device() regardless of the model's configured execution device. If the model is set to cpu_only=True (or the cache selects CPU), this causes GPU-bound inputs with a CPU model and breaks inference. Use the actual device of the loaded model or cache record instead of the global chooser to avoid device-mismatch failures when CPU-only models are used.
…t LLM models Derive the execution device from the loaded model parameters instead of the global TorchDevice chooser so that cpu_only models no longer receive GPU-bound inputs. Also expose the existing cpu_only setting in the frontend Model Manager for Text LLM models.
|
@Pfannkuchensack A few more things that you might want to address:
There are no new tests despite new functionality. Are there any that you think could be added? |
- Bound max_tokens to 1-2048 on ExpandPromptRequest to prevent OOM - Replace asserts with explicit type checks and proper HTTP status codes (404 for unknown models, 422 for wrong model type, 500 for unexpected) - Use float32 dtype for cpu_only TextLLM models instead of global fp16 - Add 16 tests for TextLLMPipeline and API request validation
Saves the previous prompt before LLM overwrites it (Expand Prompt and Image to Prompt). Pressing Ctrl+Z in the prompt textarea restores the original prompt. Undo state auto-expires after 30 seconds and is cleared when the user types manually.
- Add docs/features/prompt-tools.md covering Expand Prompt, Image to Prompt, compatible models, Ctrl+Z undo, and the workflow node - Register new doc page in mkdocs.yml under Features - Add What's New item in en.json for the LLM Prompt Tools feature
|
You have merge conflicts 😞 |
Summary
Adds two new buttons to the positive prompt area:
AutoModelForCausalLM) to expand brief prompts into detailed image generation promptsBackend: New
TextLLMmodel type with config, loader, pipeline wrapper, workflow node, and two new API endpoints (expand-prompt,image-to-prompt). Also fixes HuggingFace metadata fetch assertion error when file size isNone.Frontend:
ExpandPromptButtonandImageToPromptButtoncomponents with model picker popovers, RTK Query mutations, and model type hooks. Buttons only appear when compatible models are installed.Why
#8430
QA Instructions
Qwen/Qwen2.5-1.5B-Instruct) via Model Manager — it should be recognized astext_llmtypeNonesize should no longer crashMerge Plan
No special merge considerations. Standard merge to main.
Checklist
What's Newcopy (if doing a release after this PR)