Run converted Phi3.5-Mini-Instruction model Failed

I would like to run a local Phi 3.5 mini instruction model that is ultimately fine-tuned on PCs with NPU (Snapdragon Hexagon).
I converted the model downloaded from Huggingface and converted with AI Toolkit in vscode. Seems conversion is successful. (I converted Qwen 2.5 1.5b model in the same way and could be loaded successfully in foundry local)
But when I try to use the converted Phi3.5-Mini-Instruction. It failed. Please help resolve my issue. Thanks!
Below are details for the issue:
**Environment**
OS: Win11 25H2 26200.6901
NPU: Snapdragon X Elite - X1E78100 - Qualcomm Hexagon NPU
Driver version: 30.0.143.0 
Foundry local: 0.7.120+3b92ed4014

**Reproduce steps**
1. Set foundry local cache directory to my local model path
2. Run with "foundry model run Customized-Phi3.5-Mini-3.8B-qnn-npu"
The output is as below:
🕒 Loading model... [15:21:18 ERR] Failed loading model:Customized-Phi3.5-Mini-3.8B-qnn-npu
Exception: Failed: Loading model Customized-Phi3.5-Mini-3.8B-qnn-npu from http://127.0.0.1:57698/openai/load/Customized-Phi3.5-Mini-3.8B-qnn-npu?ttl=600
Internal Server Error
Failed loading model Customized-Phi3.5-Mini-3.8B-qnn-npu
Failed to load from EpContext model. qnn_backend_manager.cc:1138 onnxruntime::qnn::QnnBackendManager::LoadCachedQnnContextFromBuffer Failed to create context from binary. Error code: 1002




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Run converted Phi3.5-Mini-Instruction model Failed #290

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Run converted Phi3.5-Mini-Instruction model Failed #290

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions