Feat(UI): Add LLM-powered prompt expansion and image-to-prompt features by Pfannkuchensack · Pull Request #8899 · invoke-ai/InvokeAI

Pfannkuchensack · 2026-02-23T01:48:40Z

Summary

Adds two new buttons to the positive prompt area:

"Expand Prompt" uses a local TextLLM model (AutoModelForCausalLM) to expand brief prompts into detailed image generation prompts
"Image to Prompt" uses an existing LLaVA OneVision model to generate descriptive prompts from uploaded images

Backend: New TextLLM model type with config, loader, pipeline wrapper, workflow node, and two new API endpoints (expand-prompt, image-to-prompt). Also fixes HuggingFace metadata fetch assertion error when file size is None.

Frontend: ExpandPromptButton and ImageToPromptButton components with model picker popovers, RTK Query mutations, and model type hooks. Buttons only appear when compatible models are installed.

Why

#8430

QA Instructions

Model install: Install a causal LM model (e.g. Qwen/Qwen2.5-1.5B-Instruct) via Model Manager — it should be recognized as text_llm type
Expand Prompt: Type a short prompt, click the sparkle button, select the TextLLM model, click "Expand" — the prompt should be replaced with an expanded version
Image to Prompt: Click the image button, select a LLaVA model, upload an image, click "Generate Prompt" — a descriptive prompt should appear
Conditional rendering: Buttons should only appear when compatible models are installed
HF metadata fix: Installing models from HuggingFace repos where some files have None size should no longer crash

Merge Plan

No special merge considerations. Standard merge to main.

Checklist

The PR has a short but descriptive title, suitable for a changelog
Tests added / updated (if applicable)
❗Changes to a redux slice have a corresponding migration
Documentation added / updated (if applicable)
Updated What's New copy (if doing a release after this PR)

Adds two new buttons to the positive prompt area: - "Expand Prompt" uses a local TextLLM model (AutoModelForCausalLM) to expand brief prompts into detailed image generation prompts - "Image to Prompt" uses an existing LLaVA OneVision model to generate descriptive prompts from uploaded images Backend: new TextLLM model type with config, loader, pipeline wrapper, workflow node, and two new API endpoints (expand-prompt, image-to-prompt). Also fixes HuggingFace metadata fetch assertion error when file size is None. Frontend: ExpandPromptButton and ImageToPromptButton components with model picker popovers, RTK Query mutations, and model type hooks. Buttons only appear when compatible models are installed.

JPPhoto

Use model execution device for text LLM inference - /mnt/AI/InvokeAI3/src/invokeai/app/invocations/text_llm.py:56-61

When a text LLM is configured with cpu_only=True (or otherwise cached on CPU), model_on_device() keeps the model on CPU, but the pipeline inputs are moved to TorchDevice.choose_torch_device() which selects GPU when available. That device mismatch triggers runtime errors like "Expected all tensors to be on the same device" during generation. Consider deriving the device from the loaded model (e.g., next(model.parameters()).device) or the cache's execution device instead of a global device chooser.

Utilities LLM endpoints ignore cpu_only device selection - /mnt/AI/InvokeAI3/src/invokeai/app/api/routers/utilities.py:93-99

Both _run_expand_prompt and _run_image_to_prompt move inputs to TorchDevice.choose_torch_device() regardless of the model's configured execution device. If the model is set to cpu_only=True (or the cache selects CPU), this causes GPU-bound inputs with a CPU model and breaks inference. Use the actual device of the loaded model or cache record instead of the global chooser to avoid device-mismatch failures when CPU-only models are used.

…t LLM models Derive the execution device from the loaded model parameters instead of the global TorchDevice chooser so that cpu_only models no longer receive GPU-bound inputs. Also expose the existing cpu_only setting in the frontend Model Manager for Text LLM models.

JPPhoto · 2026-02-23T21:22:36Z

@Pfannkuchensack A few more things that you might want to address:

Unbounded max_tokens in new API can cause excessive generation/OOM

utilities.py:65 (invokeai/app/api/routers/utilities.py:65) defines max_tokens without bounds.
text_llm invocation caps this at <=2048, but /v1/utilities/expand-prompt does not, so direct API calls can request arbitrarily large generations.
Recommendation: make ExpandPromptRequest.max_tokens a pydantic Field, add limits (e.g. max_tokens: int = Field(default=300, ge=1, le=2048)).

Endpoint uses assert for request/model validation and maps all failures to HTTP 500 (typical for Invoke but we could do better!)

asserts validate user-controlled model selection (utilities.py:155 (invokeai/app/api/routers/utilities.py:155), utilities.py:159 (invokeai/app/api/routers/utilities.py:159)),
then a blanket catch maps those failures to 500 (utilities.py:125 (invokeai/app/api/routers/utilities.py:125), utilities.py:191 (invokeai/app/api/routers/utilities.py:191)).
Invalid model_key/wrong model type should be a client error (4xx), not internal server error; asserts are also not robust validation behavior.
Recommendation: replace asserts with explicit checks, log errors, and raise HTTPException(status_code=400/404/422, ...); keep 500 only for unexpected failures.

New Text LLM loader uses global dtype instead of execution-device-aware dtype

text_llm.py:25 (invokeai/backend/model_manager/load/model_loaders/text_llm.py:25) loads with torch_dtype=self._torch_dtype.
self._torch_dtype is chosen once from global preferred device, not from the model's eventual execution device (which may be CPU when cpu_only=True).
This can load CPU-only models in an unsuitable dtype (commonly fp16 on CUDA hosts), causing inference failures unrelated to placement.
Recommendation: choose dtype from the model's effective execution device (or force safe CPU dtype when cpu_only=True).

Testing

There are no new tests despite new functionality. Are there any that you think could be added?

- Bound max_tokens to 1-2048 on ExpandPromptRequest to prevent OOM - Replace asserts with explicit type checks and proper HTTP status codes (404 for unknown models, 422 for wrong model type, 500 for unexpected) - Use float32 dtype for cpu_only TextLLM models instead of global fp16 - Add 16 tests for TextLLMPipeline and API request validation

Saves the previous prompt before LLM overwrites it (Expand Prompt and Image to Prompt). Pressing Ctrl+Z in the prompt textarea restores the original prompt. Undo state auto-expires after 30 seconds and is cleared when the user types manually.

- Add docs/features/prompt-tools.md covering Expand Prompt, Image to Prompt, compatible models, Ctrl+Z undo, and the workflow node - Register new doc page in mkdocs.yml under Features - Add What's New item in en.json for the LLM Prompt Tools feature

joshistoast · 2026-02-27T07:50:37Z

You have merge conflicts 😞

github-actions bot added api python PRs that change python files invocations PRs that change invocations backend PRs that change backend files frontend PRs that change frontend files labels Feb 23, 2026

chore fix windows paths

af1d87b

lstein assigned JPPhoto Feb 23, 2026

JPPhoto requested changes Feb 23, 2026

View reviewed changes

JPPhoto and others added 2 commits February 23, 2026 09:16

Merge branch 'main' into feature/llm-prompt-tools

c027bcc

github-actions bot added the python-tests PRs that change python tests label Feb 23, 2026

Pfannkuchensack added 2 commits February 23, 2026 23:43

Add Ctrl+Z undo for LLM prompt changes

370bdd8

Saves the previous prompt before LLM overwrites it (Expand Prompt and Image to Prompt). Pressing Ctrl+Z in the prompt textarea restores the original prompt. Undo state auto-expires after 30 seconds and is cleared when the user types manually.

Pfannkuchensack marked this pull request as ready for review February 24, 2026 00:49

Pfannkuchensack requested review from blessedcoolant, dunkeroni and lstein as code owners February 24, 2026 00:49

github-actions bot added Root docs PRs that change docs labels Feb 24, 2026

Pfannkuchensack marked this pull request as draft February 24, 2026 00:49

lstein added the v6.13.x label Feb 27, 2026

lstein added this to Invoke - Community Roadmap Feb 27, 2026

lstein moved this to 6.13.x in Invoke - Community Roadmap Feb 27, 2026

Pfannkuchensack added 3 commits February 28, 2026 04:37

Merge remote-tracking branch 'origin/main' into feature/llm-prompt-tools

98f8655

fix: resolve merge conflict in mkdocs.yml nav

506c5e1

Merge branch 'main' into feature/llm-prompt-tools

e8852e0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat(UI): Add LLM-powered prompt expansion and image-to-prompt features#8899

Feat(UI): Add LLM-powered prompt expansion and image-to-prompt features#8899
Pfannkuchensack wants to merge 10 commits intoinvoke-ai:mainfrom
Pfannkuchensack:feature/llm-prompt-tools

Pfannkuchensack commented Feb 23, 2026 •

edited

Loading

Uh oh!

JPPhoto left a comment

Uh oh!

JPPhoto commented Feb 23, 2026

Uh oh!

joshistoast commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Pfannkuchensack commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

QA Instructions

Merge Plan

Checklist

Uh oh!

JPPhoto left a comment

Choose a reason for hiding this comment

Uh oh!

JPPhoto commented Feb 23, 2026

Uh oh!

joshistoast commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Pfannkuchensack commented Feb 23, 2026 •

edited

Loading