Skip to content

Feat(UI): Add LLM-powered prompt expansion and image-to-prompt features#8899

Draft
Pfannkuchensack wants to merge 10 commits intoinvoke-ai:mainfrom
Pfannkuchensack:feature/llm-prompt-tools
Draft

Feat(UI): Add LLM-powered prompt expansion and image-to-prompt features#8899
Pfannkuchensack wants to merge 10 commits intoinvoke-ai:mainfrom
Pfannkuchensack:feature/llm-prompt-tools

Conversation

@Pfannkuchensack
Copy link
Collaborator

@Pfannkuchensack Pfannkuchensack commented Feb 23, 2026

Summary

Adds two new buttons to the positive prompt area:

  • "Expand Prompt" uses a local TextLLM model (AutoModelForCausalLM) to expand brief prompts into detailed image generation prompts
  • "Image to Prompt" uses an existing LLaVA OneVision model to generate descriptive prompts from uploaded images

Backend: New TextLLM model type with config, loader, pipeline wrapper, workflow node, and two new API endpoints (expand-prompt, image-to-prompt). Also fixes HuggingFace metadata fetch assertion error when file size is None.

Frontend: ExpandPromptButton and ImageToPromptButton components with model picker popovers, RTK Query mutations, and model type hooks. Buttons only appear when compatible models are installed.

Why

#8430

QA Instructions

  1. Model install: Install a causal LM model (e.g. Qwen/Qwen2.5-1.5B-Instruct) via Model Manager — it should be recognized as text_llm type
  2. Expand Prompt: Type a short prompt, click the sparkle button, select the TextLLM model, click "Expand" — the prompt should be replaced with an expanded version
  3. Image to Prompt: Click the image button, select a LLaVA model, upload an image, click "Generate Prompt" — a descriptive prompt should appear
  4. Conditional rendering: Buttons should only appear when compatible models are installed
  5. HF metadata fix: Installing models from HuggingFace repos where some files have None size should no longer crash

Merge Plan

No special merge considerations. Standard merge to main.

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • ❗Changes to a redux slice have a corresponding migration
  • Documentation added / updated (if applicable)
  • Updated What's New copy (if doing a release after this PR)

Adds two new buttons to the positive prompt area:
- "Expand Prompt" uses a local TextLLM model (AutoModelForCausalLM) to expand brief prompts into detailed image generation prompts
- "Image to Prompt" uses an existing LLaVA OneVision model to generate descriptive prompts from uploaded images

Backend: new TextLLM model type with config, loader, pipeline wrapper, workflow node, and two new API endpoints (expand-prompt, image-to-prompt). Also fixes HuggingFace metadata fetch assertion error when file size is None.

Frontend: ExpandPromptButton and ImageToPromptButton components with model picker popovers, RTK Query mutations, and model type hooks. Buttons only appear when compatible models are installed.
@github-actions github-actions bot added api python PRs that change python files invocations PRs that change invocations backend PRs that change backend files frontend PRs that change frontend files labels Feb 23, 2026
Copy link
Collaborator

@JPPhoto JPPhoto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Use model execution device for text LLM inference - /mnt/AI/InvokeAI3/src/invokeai/app/invocations/text_llm.py:56-61

When a text LLM is configured with cpu_only=True (or otherwise cached on CPU), model_on_device() keeps the model on CPU, but the pipeline inputs are moved to TorchDevice.choose_torch_device() which selects GPU when available. That device mismatch triggers runtime errors like "Expected all tensors to be on the same device" during generation. Consider deriving the device from the loaded model (e.g., next(model.parameters()).device) or the cache's execution device instead of a global device chooser.

  • Utilities LLM endpoints ignore cpu_only device selection - /mnt/AI/InvokeAI3/src/invokeai/app/api/routers/utilities.py:93-99

Both _run_expand_prompt and _run_image_to_prompt move inputs to TorchDevice.choose_torch_device() regardless of the model's configured execution device. If the model is set to cpu_only=True (or the cache selects CPU), this causes GPU-bound inputs with a CPU model and breaks inference. Use the actual device of the loaded model or cache record instead of the global chooser to avoid device-mismatch failures when CPU-only models are used.

JPPhoto and others added 2 commits February 23, 2026 09:16
…t LLM models

Derive the execution device from the loaded model parameters instead of
the global TorchDevice chooser so that cpu_only models no longer receive
GPU-bound inputs. Also expose the existing cpu_only setting in the
frontend Model Manager for Text LLM models.
@JPPhoto
Copy link
Collaborator

JPPhoto commented Feb 23, 2026

@Pfannkuchensack A few more things that you might want to address:

  1. Unbounded max_tokens in new API can cause excessive generation/OOM
  • utilities.py:65 (invokeai/app/api/routers/utilities.py:65) defines max_tokens without bounds.
  • text_llm invocation caps this at <=2048, but /v1/utilities/expand-prompt does not, so direct API calls can request arbitrarily large generations.
  • Recommendation: make ExpandPromptRequest.max_tokens a pydantic Field, add limits (e.g. max_tokens: int = Field(default=300, ge=1, le=2048)).
  1. Endpoint uses assert for request/model validation and maps all failures to HTTP 500 (typical for Invoke but we could do better!)
  • asserts validate user-controlled model selection (utilities.py:155 (invokeai/app/api/routers/utilities.py:155), utilities.py:159 (invokeai/app/api/routers/utilities.py:159)),
  • then a blanket catch maps those failures to 500 (utilities.py:125 (invokeai/app/api/routers/utilities.py:125), utilities.py:191 (invokeai/app/api/routers/utilities.py:191)).
  • Invalid model_key/wrong model type should be a client error (4xx), not internal server error; asserts are also not robust validation behavior.
  • Recommendation: replace asserts with explicit checks, log errors, and raise HTTPException(status_code=400/404/422, ...); keep 500 only for unexpected failures.
  1. New Text LLM loader uses global dtype instead of execution-device-aware dtype
  • text_llm.py:25 (invokeai/backend/model_manager/load/model_loaders/text_llm.py:25) loads with torch_dtype=self._torch_dtype.
  • self._torch_dtype is chosen once from global preferred device, not from the model's eventual execution device (which may be CPU when cpu_only=True).
  • This can load CPU-only models in an unsuitable dtype (commonly fp16 on CUDA hosts), causing inference failures unrelated to placement.
  • Recommendation: choose dtype from the model's effective execution device (or force safe CPU dtype when cpu_only=True).
  1. Testing

There are no new tests despite new functionality. Are there any that you think could be added?

- Bound max_tokens to 1-2048 on ExpandPromptRequest to prevent OOM
- Replace asserts with explicit type checks and proper HTTP status codes
  (404 for unknown models, 422 for wrong model type, 500 for unexpected)
- Use float32 dtype for cpu_only TextLLM models instead of global fp16
- Add 16 tests for TextLLMPipeline and API request validation
@github-actions github-actions bot added the python-tests PRs that change python tests label Feb 23, 2026
Saves the previous prompt before LLM overwrites it (Expand Prompt and
Image to Prompt). Pressing Ctrl+Z in the prompt textarea restores
the original prompt. Undo state auto-expires after 30 seconds and
is cleared when the user types manually.
- Add docs/features/prompt-tools.md covering Expand Prompt, Image to
  Prompt, compatible models, Ctrl+Z undo, and the workflow node
- Register new doc page in mkdocs.yml under Features
- Add What's New item in en.json for the LLM Prompt Tools feature
@Pfannkuchensack Pfannkuchensack marked this pull request as ready for review February 24, 2026 00:49
@github-actions github-actions bot added Root docs PRs that change docs labels Feb 24, 2026
@Pfannkuchensack Pfannkuchensack marked this pull request as draft February 24, 2026 00:49
@lstein lstein added the v6.13.x label Feb 27, 2026
@joshistoast
Copy link
Contributor

You have merge conflicts 😞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api backend PRs that change backend files docs PRs that change docs frontend PRs that change frontend files invocations PRs that change invocations python PRs that change python files python-tests PRs that change python tests Root v6.13.x

Projects

Status: 6.13.x

Development

Successfully merging this pull request may close these issues.

4 participants