Skip to content

Run converted Phi3.5-Mini-Instruction model Failed #290

@jarodxiangliu

Description

@jarodxiangliu

I would like to run a local Phi 3.5 mini instruction model that is ultimately fine-tuned on PCs with NPU (Snapdragon Hexagon).
I converted the model downloaded from Huggingface and converted with AI Toolkit in vscode. Seems conversion is successful. (I converted Qwen 2.5 1.5b model in the same way and could be loaded successfully in foundry local)
But when I try to use the converted Phi3.5-Mini-Instruction. It failed. Please help resolve my issue. Thanks!
Below are details for the issue:
Environment
OS: Win11 25H2 26200.6901
NPU: Snapdragon X Elite - X1E78100 - Qualcomm Hexagon NPU
Driver version: 30.0.143.0
Foundry local: 0.7.120+3b92ed4014

Reproduce steps

  1. Set foundry local cache directory to my local model path
  2. Run with "foundry model run Customized-Phi3.5-Mini-3.8B-qnn-npu"
    The output is as below:
    🕒 Loading model... [15:21:18 ERR] Failed loading model:Customized-Phi3.5-Mini-3.8B-qnn-npu
    Exception: Failed: Loading model Customized-Phi3.5-Mini-3.8B-qnn-npu from http://127.0.0.1:57698/openai/load/Customized-Phi3.5-Mini-3.8B-qnn-npu?ttl=600
    Internal Server Error
    Failed loading model Customized-Phi3.5-Mini-3.8B-qnn-npu
    Failed to load from EpContext model. qnn_backend_manager.cc:1138 onnxruntime::qnn::QnnBackendManager::LoadCachedQnnContextFromBuffer Failed to create context from binary. Error code: 1002

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions