v1.11.2 VL cannot be used and cannot recognize and extract images #30504
Unanswered
shifengkuan
asked this question in
Help
Replies: 1 comment 2 replies
-
|
This regression in v1.11.2 is likely due to changes in how Dify handles multimodal image input, model feature flags, and environment variables. Here’s a checklist to help you troubleshoot:
Recent changes in v1.11.2 may have altered how model features are detected or how the vision flag is checked, causing previously working multimodal models to lose image input capability if their schema/features are not set correctly. To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment

Uh oh!
There was an error while loading. Please reload this page.
-
Self Checks
1. Is this request related to a challenge you're experiencing? Tell me about your story.
The V1.11.2 version of the VL multimodal model cannot be used and cannot recognize and extract images, but it can answer text content. However, the V1.11.1 version can be used normally. The relevant configurations are the same, and the visual function is enabled in both. What's the reason for this? How to solve it?
2. Additional context or comments
No response
Beta Was this translation helpful? Give feedback.
All reactions