v1.11.2 VL cannot be used and cannot recognize and extract images #30504

shifengkuan · 2026-01-04T08:17:19Z

shifengkuan
Jan 4, 2026

Self Checks

I have searched for existing issues search for existing issues, including closed ones.
I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:)
Please do not modify this template :) and fill in all the required fields.

1. Is this request related to a challenge you're experiencing? Tell me about your story.

The V1.11.2 version of the VL multimodal model cannot be used and cannot recognize and extract images, but it can answer text content. However, the V1.11.1 version can be used normally. The relevant configurations are the same, and the visual function is enabled in both. What's the reason for this? How to solve it?

2. Additional context or comments

No response

shifengkuan · 2026-01-04T08:23:31Z

dosubot[bot]
bot Jan 4, 2026

This regression in v1.11.2 is likely due to changes in how Dify handles multimodal image input, model feature flags, and environment variables. Here’s a checklist to help you troubleshoot:

Make sure your VL model advertises the VISION capability in its schema—Dify will not enable image input unless this feature is present, even if the visual function is enabled in the UI. There is no manual override for this flag. Double-check your model metadata if you’re using a local or Alibaba model.
For Docker deployments, confirm that the INTERNAL_FILES_URL environment variable is set to a Docker-network-accessible URL. If this is missing or misconfigured, image uploads may silently fail.
Check the MULTIMODAL_SEND_FORMAT (or MULTIMODAL_SEND_IMAGE_FORMAT) environment variable. Some models require images as URLs, others as base64. If your model expects a URL, set this variable to url; if it expects base64, set it to base64 and ensure the image string starts with the correct prefix (e.g., data:image/png;base64,).
The file upload option must be enabled in your app’s features, and the LLM node must use the Vision toggle with sys.files as the input. Republishing the app may restore the file upload icon if it’s missing.
If you’re using knowledge base recall, the dataset must be marked as multimodal (is_multimodal flag), which is set automatically based on the model’s features. If your model doesn’t expose VISION, image input won’t be available.
For image input failures, check for network or Docker connectivity issues, image size limits (default 2MB), and download timeouts (default 60s), both configurable via ATTACHMENT_IMAGE_FILE_SIZE_LIMIT and ATTACHMENT_IMAGE_DOWNLOAD_TIMEOUT.
If you still get no output, try capturing error logs or API responses when submitting images—these can help pinpoint where the failure occurs. If you see serialization errors or warnings about image content types, make sure your image data is correctly formatted and that any custom plugins or integrations convert image objects to JSON-serializable formats before sending.

Recent changes in v1.11.2 may have altered how model features are detected or how the vision flag is checked, causing previously working multimodal models to lose image input capability if their schema/features are not set correctly.
For more details and step-by-step troubleshooting, see this issue and this checklist.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

2 replies

shifengkuan Jan 5, 2026
Author

All the above-mentioned issues have been investigated and there are no problems. The pictures have been uploaded normally, but they cannot be recognized or extracted. In version 1.11.1, it can be recognized normally by simply opening vision, but in version 1.11.2, it cannot be recognized

dosubot[bot] bot Jan 5, 2026

In v1.11.2, Dify enforces that the model schema must explicitly advertise the VISION feature for image input to work—even if the visual toggle is enabled and images upload successfully. This is stricter than v1.11.1, where vision could be enabled without this schema requirement.

To restore image recognition for the 'qwen3-v1-plus' model, you need to update the model/provider schema so the features list includes VISION (e.g., "features": ["VISION", ...]). After editing the schema, restart the provider/plugin and Dify API services so the new schema is loaded. Once done, create or update your knowledge base in Dify and select the modified model—image input should now be enabled and recognized in the UI and API. There is no manual override for this; the schema must advertise VISION for image input to work in v1.11.2 [details and step-by-step guide].

If you cannot modify the model schema, manually setting the is_multimodal flag in the database is possible but unsupported and may cause errors if the model does not truly support image embedding. For best results, use a model/provider that is confirmed to work with Dify's multimodal pipeline.

Also, double-check that your environment variables (like INTERNAL_FILES_URL and MULTIMODAL_SEND_FORMAT) match your model's requirements, and that the file variable mapping in your workflow is correct. If you still have issues after these changes, capturing error logs or API responses when submitting images can help pinpoint the problem.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v1.11.2 VL cannot be used and cannot recognize and extract images #30504

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

v1.11.2 VL cannot be used and cannot recognize and extract images #30504

Uh oh!

shifengkuan Jan 4, 2026

Self Checks

1. Is this request related to a challenge you're experiencing? Tell me about your story.

2. Additional context or comments

Replies: 1 comment · 2 replies

Uh oh!

dosubot[bot] bot Jan 4, 2026

Uh oh!

shifengkuan Jan 5, 2026 Author

Uh oh!

dosubot[bot] bot Jan 5, 2026

shifengkuan
Jan 4, 2026

Replies: 1 comment 2 replies

dosubot[bot]
bot Jan 4, 2026

shifengkuan Jan 5, 2026
Author