-
Notifications
You must be signed in to change notification settings - Fork 121
Open
Description
I would like to run a local Phi 3.5 mini instruction model that is ultimately fine-tuned on PCs with NPU (Snapdragon Hexagon).
I converted the model downloaded from Huggingface and converted with AI Toolkit in vscode. Seems conversion is successful. (I converted Qwen 2.5 1.5b model in the same way and could be loaded successfully in foundry local)
But when I try to use the converted Phi3.5-Mini-Instruction. It failed. Please help resolve my issue. Thanks!
Below are details for the issue:
Environment
OS: Win11 25H2 26200.6901
NPU: Snapdragon X Elite - X1E78100 - Qualcomm Hexagon NPU
Driver version: 30.0.143.0
Foundry local: 0.7.120+3b92ed4014
Reproduce steps
- Set foundry local cache directory to my local model path
- Run with "foundry model run Customized-Phi3.5-Mini-3.8B-qnn-npu"
The output is as below:
🕒 Loading model... [15:21:18 ERR] Failed loading model:Customized-Phi3.5-Mini-3.8B-qnn-npu
Exception: Failed: Loading model Customized-Phi3.5-Mini-3.8B-qnn-npu from http://127.0.0.1:57698/openai/load/Customized-Phi3.5-Mini-3.8B-qnn-npu?ttl=600
Internal Server Error
Failed loading model Customized-Phi3.5-Mini-3.8B-qnn-npu
Failed to load from EpContext model. qnn_backend_manager.cc:1138 onnxruntime::qnn::QnnBackendManager::LoadCachedQnnContextFromBuffer Failed to create context from binary. Error code: 1002
Metadata
Metadata
Assignees
Labels
No labels