Intel GPU is not used by qwen-asr #8850

tannisroot · 2026-03-08T05:34:49Z

tannisroot
Mar 8, 2026

Hi, I'm running LocalAI on TrueNAS 25.10 with the following kernel:
6.12.33-production+truenas
And my GPU is an Intel ARC A380 DG2.
When I'm trying to use the qwen3-asr-0.6b STT model with the intel-qwen-asr backend, CPU is used instead of the GPU.
I've determined this by observing GPU usage with intel_gpu_top (it doesn't spike when STT is doing the processing) and CPU usage with htop (CPU usage spikes on multiple cores when STT processing is done).
During startup, no GPU and no VRAM are detected by the image:

Mar 08 08:12:04 DEBUG GPU vendor gpuVendor="" caller={caller.file="/build/pkg/system/state.go"  caller.L=54 } 
Mar 08 08:12:04 DEBUG Total available VRAM vram=0 caller={caller.file="/build/pkg/system/state.go"  caller.L=56 }

However, sycl-ls does identify the GPU:

root@d9b579c8993c:/# sycl-ls
[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Arc(TM) A380 Graphics 12.56.5 [1.6.33578+15]
[opencl:cpu][opencl:0] Intel(R) OpenCL, AMD Ryzen 7 3800X 8-Core Processor              OpenCL 3.0 (Build 0) [2025.20.10.0.10_160000]
[opencl:gpu][opencl:1] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A380 Graphics OpenCL 3.0 NEO  [25.18.33578]

I would appreciate any input on why the GPU is seemingly not used and what troubleshooting steps I can take to determine why.
For information, I'm using localai intel gpu image with the following compose:

  localai:
    container_name: wyoming-localai
    image: localai/localai:latest-gpu-intel
    restart: unless-stopped
    ports:
      - 48080:8080
    networks:
      wyioming-openai: {}
    environment:
      TZ: ${timezone}
      LOCALAI_LOG_LEVEL: debug
      LOCALAI_MODELS_PATH: /models
      LOCALAI_PRELOAD_MODELS: '[{"id": "localai@${STT_MODEL}"}]'
      LOCALAI_LOAD_TO_MEMORY: ${STT_MODEL}
    volumes:
      - type: bind
        source: ${apps_storage}/localai/backends
        target: /backends
      - type: bind
        source: ${apps_storage}/localai/models
        target: /models
      - type: bind
        source: ${apps_storage}/localai/config
        target: /config
    devices:
      - /dev/dri
    healthcheck:
      test:
        - CMD
        - curl
        - -f
        - http://localhost:8080/readyz
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 600s
    deploy:
      resources:
        limits:
          cpus: 8
          memory: 16G
networks:
  wyioming-openai: {}

Full log:

CPU info:
model name	: AMD Ryzen 7 3800X 8-Core Processor
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sev sev_es
CPU:    AVX    found OK
CPU:    AVX2   found OK
CPU: no AVX512 found
Mar 08 08:31:38 DEBUG GPU vendor gpuVendor="" caller={caller.file="/build/pkg/system/state.go"  caller.L=54 } 
Mar 08 08:31:38 DEBUG Total available VRAM vram=0 caller={caller.file="/build/pkg/system/state.go"  caller.L=56 } 
Mar 08 08:31:38 INFO  Using forced capability run file capabilityRunFile="/run/localai/capability" capability="intel\n" env="" caller={caller.file="/build/pkg/system/capabilities.go"  caller.L=98 } 
Mar 08 08:31:38 INFO  Starting LocalAI threads=8 modelsPath="/models" caller={caller.file="/build/core/application/startup.go"  caller.L=31 } 
Mar 08 08:31:38 INFO  LocalAI version version="v3.12.1 (fcecc12e57be39bad2ebf50cf729408b64409553)" caller={caller.file="/build/core/application/startup.go"  caller.L=32 } 
Mar 08 08:31:38 DEBUG agent_tasks.json not found, starting with empty tasks caller={caller.file="/build/core/services/agent_jobs.go"  caller.L=129 } 
Mar 08 08:31:38 DEBUG agent_jobs.json not found, starting with empty jobs caller={caller.file="/build/core/services/agent_jobs.go"  caller.L=193 } 
Mar 08 08:31:38 INFO  AgentJobService started retention_days=30 caller={caller.file="/build/core/services/agent_jobs.go"  caller.L=1347 } 
Mar 08 08:31:38 DEBUG CPU capabilities capabilities=[3dnowprefetch abm adx aes aperfmperf apic arat avic avx avx2 bmi1 bmi2 bpext cat_l3 cdp_l3 clflush clflushopt clwb clzero cmov cmp_legacy constant_tsc cpb cpuid cqm cqm_llc cqm_mbm_local cqm_mbm_total cqm_occup_llc cr8_legacy cx16 cx8 de decodeassists extapic extd_apicid f16c flushbyasid fma fpu fsgsbase fxsr fxsr_opt ht hw_pstate ibpb ibs irperf lahf_lm lbrv lm mba mca mce misalignsse mmx mmxext monitor movbe msr mtrr mwaitx nonstop_tsc nopl npt nrip_save nx osvw overflow_recov pae pat pausefilter pclmulqdq pdpe1gb perfctr_core perfctr_llc perfctr_nb pfthreshold pge pni popcnt pse pse36 rapl rdpid rdpru rdrand rdseed rdt_a rdtscp rep_good sep sev sev_es sha_ni skinit smap smca smep ssbd sse sse2 sse4_1 sse4_2 sse4a ssse3 stibp succor svm svm_lock syscall tce topoext tsc tsc_scale umip v_spec_ctrl v_vmsave_vmload vgif vmcb_clean vme vmmcall wbnoinvd wdt xgetbv1 xsave xsavec xsaveerptr xsaveopt xtopology] caller={caller.file="/build/core/application/startup.go"  caller.L=40 } 
Mar 08 08:31:38 DEBUG No system backends found caller={caller.file="/build/core/gallery/backends.go"  caller.L=335 } 
Mar 08 08:31:38 DEBUG Registering backend name="intel-qwen-asr" runFile="/backends/intel-qwen-asr/run.sh" caller={caller.file="/build/core/gallery/backends.go"  caller.L=445 } 
Mar 08 08:31:38 DEBUG Registering backend name="qwen-asr" runFile="/backends/intel-qwen-asr/run.sh" caller={caller.file="/build/core/gallery/backends.go"  caller.L=445 } 
Mar 08 08:31:38 INFO  Preloading models path="/models" caller={caller.file="/build/core/config/model_config_loader.go"  caller.L=269 } 
  Model name: qwen3-asr-0.6b                                                  
Mar 08 08:31:38 DEBUG Config overrides overrides=map[backend:qwen-asr known_usecases:[transcript] parameters:map[model:Qwen/Qwen3-ASR-0.6B]] caller={caller.file="/build/core/gallery/models.go"  caller.L=170 } 
Mar 08 08:31:38 DEBUG Written config file file="/models/qwen3-asr-0.6b.yaml" caller={caller.file="/build/core/gallery/models.go"  caller.L=276 } 
Mar 08 08:31:38 DEBUG Written gallery file file="/models/._gallery_qwen3-asr-0.6b.yaml" caller={caller.file="/build/core/gallery/models.go"  caller.L=286 } 
Mar 08 08:31:38 DEBUG Installed model model="qwen3-asr-0.6b" caller={caller.file="/build/core/gallery/models.go"  caller.L=136 } 
Mar 08 08:31:38 DEBUG Installing backend backend="qwen-asr" caller={caller.file="/build/core/gallery/models.go"  caller.L=138 } 
Mar 08 08:31:38 DEBUG No system backends found caller={caller.file="/build/core/gallery/backends.go"  caller.L=335 } 
Mar 08 08:31:38 DEBUG Model name="qwen3-asr-0.6b" config={/models/qwen3-asr-0.6b.yaml  {{Qwen/Qwen3-ASR-0.6B}  false 0 0xc00011ab70 0xc00011ab78 0xc00011ab80 0xc00011abb0 false 0 false 0 0 0 0 0 0xc00011aba8 0xc00011aba0 0xc00011ab48 {false} <nil> map[]  0 0 0 0 } qwen3-asr-0.6b 0xc00011ab68 0xc00011ab60 0xc00011abb8 map[] 0xc00011abb9 qwen-asr {     false <nil>  } [FLAG_TRANSCRIPT] 0xc00011abd0 {   } [] [] []    map[] {false {false false false false false  false   []}   [] [] []   [] [] []    <nil>} {<nil> <nil> <nil> [] []} map[] {   0 0  false false 0xc00011ab98 0xc00011ab90 0xc00011ab88 <nil> 0xc00011abb8 0xc00011abb9 0xc00011abb9 0xc00011abb9  [] [] [] [] [] 0xc00011abc0 false   [] [] 0 false  0   0 false false 0 0 0 false  {0 0 0}  <nil> false     0 0 0 0 0} {false    false 0   } 0 {0 0} { } false []   [] [] { } {0 0 false false false false false 0 0 false}} caller={caller.file="/build/core/application/startup.go"  caller.L=117 } 
Mar 08 08:31:38 DEBUG runtime_settings.json not found, using defaults caller={caller.file="/build/core/application/startup.go"  caller.L=214 } 
Mar 08 08:31:38 DEBUG Auto loading model into memory from file model="qwen3-asr-0.6b" file="Qwen/Qwen3-ASR-0.6B" caller={caller.file="/build/core/application/startup.go"  caller.L=148 } 
Mar 08 08:31:38 INFO  BackendLoader starting modelID="qwen3-asr-0.6b" backend="qwen-asr" model="Qwen/Qwen3-ASR-0.6B" caller={caller.file="/build/pkg/model/initializers.go"  caller.L=159 } 
Mar 08 08:31:38 DEBUG Loading model in memory from file file="/models/Qwen/Qwen3-ASR-0.6B" caller={caller.file="/build/pkg/model/loader.go"  caller.L=218 } 
Mar 08 08:31:38 DEBUG Loading Model with gRPC modelID="qwen3-asr-0.6b" file="/models/Qwen/Qwen3-ASR-0.6B" backend="qwen-asr" options={qwen-asr Qwen/Qwen3-ASR-0.6B qwen3-asr-0.6b {{}} 0xc0005e6c08 map[] 20 2 false} caller={caller.file="/build/pkg/model/initializers.go"  caller.L=53 } 
Mar 08 08:31:38 DEBUG Loading external backend uri="/backends/intel-qwen-asr/run.sh" caller={caller.file="/build/pkg/model/initializers.go"  caller.L=77 } 
Mar 08 08:31:38 DEBUG external backend is file file=&{run.sh 192 448 {0 63907278621 0x4f5f7e0} {131 24703 1 33216 0 0 0 0 192 512 9 {1771681821 0} {1771681821 0} {1772944781 583702502} [0 0 0]}} caller={caller.file="/build/pkg/model/initializers.go"  caller.L=80 } 
Mar 08 08:31:38 DEBUG Loading GRPC Process process="/backends/intel-qwen-asr/run.sh" caller={caller.file="/build/pkg/model/process.go"  caller.L=112 } 
Mar 08 08:31:38 DEBUG GRPC Service will be running id="qwen3-asr-0.6b" address="127.0.0.1:35089" caller={caller.file="/build/pkg/model/process.go"  caller.L=114 } 
Mar 08 08:31:38 DEBUG GRPC Service state dir dir="/tmp/go-processmanager2152813416" caller={caller.file="/build/pkg/model/process.go"  caller.L=138 } 
Mar 08 08:31:38 DEBUG GRPC Service Started caller={caller.file="/build/pkg/model/initializers.go"  caller.L=92 } 
Mar 08 08:31:38 DEBUG Wait for the service to start up caller={caller.file="/build/pkg/model/initializers.go"  caller.L=105 } 
Mar 08 08:31:38 DEBUG Options options=ContextSize:1024 Seed:774469262 NBatch:512 MMap:true NGPULayers:9999999 Threads:8 FlashAttention:"auto" caller={caller.file="/build/pkg/model/initializers.go"  caller.L=106 } 
Mar 08 08:31:38 DEBUG GRPC stdout id="qwen3-asr-0.6b-127.0.0.1:35089" line="Initializing libbackend for intel-qwen-asr" caller={caller.file="/build/pkg/model/process.go"  caller.L=162 } 
Mar 08 08:31:38 DEBUG GRPC stdout id="qwen3-asr-0.6b-127.0.0.1:35089" line="Using portable Python" caller={caller.file="/build/pkg/model/process.go"  caller.L=162 } 
Mar 08 08:31:38 DEBUG GRPC stdout id="qwen3-asr-0.6b-127.0.0.1:35089" line="Added /backends/intel-qwen-asr/lib to LD_LIBRARY_PATH for GPU libraries" caller={caller.file="/build/pkg/model/process.go"  caller.L=162 } 
Mar 08 08:31:40 DEBUG GRPC stderr id="qwen3-asr-0.6b-127.0.0.1:35089" line="/backends/intel-qwen-asr/venv/lib/python3.12/site-packages/transformers/utils/hub.py:110: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead." caller={caller.file="/build/pkg/model/process.go"  caller.L=153 } 
Mar 08 08:31:40 DEBUG GRPC stderr id="qwen3-asr-0.6b-127.0.0.1:35089" line="  warnings.warn(" caller={caller.file="/build/pkg/model/process.go"  caller.L=153 } 
Mar 08 08:31:44 DEBUG GRPC stderr id="qwen3-asr-0.6b-127.0.0.1:35089" line="Server started. Listening on: 127.0.0.1:35089" caller={caller.file="/build/pkg/model/process.go"  caller.L=153 } 
Mar 08 08:31:44 DEBUG GRPC Service Ready caller={caller.file="/build/pkg/model/initializers.go"  caller.L=113 } 
Mar 08 08:31:44 DEBUG GRPC: Loading model with options options={{{} [] [] 0xc00084b958} 0 [] Qwen/Qwen3-ASR-0.6B 1024 774469262 512 false false true false false false false 9999999   8 0 0 0 0 /models/Qwen/Qwen3-ASR-0.6B   false 0 false   0     0 false    0 false false 0 0 0  false  0 0 0   0 0 0 0  auto false /models [] [] []   [] false []} caller={caller.file="/build/pkg/model/initializers.go"  caller.L=136 } 
Mar 08 08:31:44 DEBUG GRPC stderr id="qwen3-asr-0.6b-127.0.0.1:35089" line="Loading Qwen3-ASR from Qwen/Qwen3-ASR-0.6B" caller={caller.file="/build/pkg/model/process.go"  caller.L=153 } 
Mar 08 08:31:45 DEBUG GRPC stderr id="qwen3-asr-0.6b-127.0.0.1:35089" line="The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details." caller={caller.file="/build/pkg/model/process.go"  caller.L=153 } 
Mar 08 08:31:49 DEBUG GRPC stderr id="qwen3-asr-0.6b-127.0.0.1:35089" line="Qwen3-ASR model loaded successfully" caller={caller.file="/build/pkg/model/process.go"  caller.L=153 } 
Mar 08 08:31:49 DEBUG reading file for dynamic config update filename="/configuration/api_keys.json" caller={caller.file="/build/core/application/config_file_watcher.go"  caller.L=65 } 
Mar 08 08:31:49 DEBUG processing api keys runtime update numKeys=0 caller={caller.file="/build/core/application/config_file_watcher.go"  caller.L=138 } 
Mar 08 08:31:49 DEBUG no API keys discovered from dynamic config file caller={caller.file="/build/core/application/config_file_watcher.go"  caller.L=152 } 
Mar 08 08:31:49 DEBUG total api keys after processing numKeys=0 caller={caller.file="/build/core/application/config_file_watcher.go"  caller.L=155 } 
Mar 08 08:31:49 DEBUG reading file for dynamic config update filename="/configuration/external_backends.json" caller={caller.file="/build/core/application/config_file_watcher.go"  caller.L=65 } 
Mar 08 08:31:49 DEBUG processing external_backends.json caller={caller.file="/build/core/application/config_file_watcher.go"  caller.L=164 } 
Mar 08 08:31:49 DEBUG external backends loaded from external_backends.json caller={caller.file="/build/core/application/config_file_watcher.go"  caller.L=181 } 
Mar 08 08:31:49 DEBUG reading file for dynamic config update filename="/configuration/runtime_settings.json" caller={caller.file="/build/core/application/config_file_watcher.go"  caller.L=65 } 
Mar 08 08:31:49 DEBUG processing runtime_settings.json caller={caller.file="/build/core/application/config_file_watcher.go"  caller.L=189 } 
Mar 08 08:31:49 DEBUG runtime settings loaded from runtime_settings.json caller={caller.file="/build/core/application/config_file_watcher.go"  caller.L=359 } 
Mar 08 08:31:49 INFO  core/startup process completed! caller={caller.file="/build/core/application/startup.go"  caller.L=163 } 
Mar 08 08:31:49 INFO  LocalAI is started and running address=":8080" caller={caller.file="/build/core/cli/run.go"  caller.L=293 } 
Mar 08 08:31:53 INFO  HTTP request method="GET" path="/readyz" status=200 caller={caller.file="/build/core/http/app.go"  caller.L=118 } 
Mar 08 08:32:04 DEBUG overriding empty model name in request body with value found earlier in middleware chain context localModelName="qwen3-asr-0.6b" caller={caller.file="/build/core/http/middleware/request.go"  caller.L=138 } 
Mar 08 08:32:04 DEBUG input.Input input="<nil>" caller={caller.file="/build/core/http/middleware/request.go"  caller.L=412 } 
Mar 08 08:32:04 DEBUG Audio file copied dst="/tmp/whisper2055781948/recording.wav" caller={caller.file="/build/core/http/endpoints/openai/transcription.go"  caller.L=74 } 
Mar 08 08:32:04 DEBUG Model already loaded in memory model="qwen3-asr-0.6b" caller={caller.file="/build/pkg/model/loader.go"  caller.L=256 } 
Mar 08 08:32:04 DEBUG Checking model availability model="qwen3-asr-0.6b" caller={caller.file="/build/pkg/model/loader.go"  caller.L=259 } 
Mar 08 08:32:04 DEBUG Model already loaded model="qwen3-asr-0.6b" caller={caller.file="/build/pkg/model/initializers.go"  caller.L=246 } 
Mar 08 08:32:05 DEBUG GRPC stderr id="qwen3-asr-0.6b-127.0.0.1:35089" line="Setting `pad_token_id` to `eos_token_id`:151645 for open-end generation." caller={caller.file="/build/pkg/model/process.go"  caller.L=153 } 
Mar 08 08:32:06 DEBUG Transcribed transcription=&{[{0 0s 0s Turn off kitchen lights. [] }] Turn off kitchen lights.} caller={caller.file="/build/core/http/endpoints/openai/transcription.go"  caller.L=81 } 
Mar 08 08:32:06 INFO  HTTP request method="POST" path="/v1/audio/transcriptions" status=200 caller={caller.file="/build/core/http/app.go"  caller.L=118 }

aniruddhaadak80 · 2026-03-10T05:46:28Z

aniruddhaadak80
Mar 10, 2026

The log line where GPU vendor and VRAM are both empty stands out, because it suggests the problem may begin before qwen-asr inference and before model-specific code decides where to run. If the runtime cannot identify the Intel device at its own capability layer, backend selection may fall back in ways that still look superficially correct from the outside.nnGiven that sycl-ls sees the card, I would inspect the container path that LocalAI itself uses for detection, especially device visibility, permissions on /dev/dri, and whether the required oneAPI or Level Zero libraries are available to the process that does the actual probe. The gap seems to be detection consistency more than raw hardware absence.

1 reply

tannisroot Mar 10, 2026
Author

did you ask an LLM write this? why?

tannisroot · 2026-03-10T08:32:16Z

tannisroot
Mar 10, 2026
Author

Did some testing with an LLM (qwen3 running on llama-cpp backend) and Intel GPU is actually used there.
I've also verified that the rocm-qwen-asr backend running on an AMD gpu system uses GPU for transcription, so the issue is limited to qwen-asr and Intel GPUs.
This seems like a bug so I'm gonna go ahead an open an issue.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Intel GPU is not used by qwen-asr #8850

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

Intel GPU is not used by qwen-asr #8850

Uh oh!

tannisroot Mar 8, 2026

Replies: 2 comments · 1 reply

Uh oh!

aniruddhaadak80 Mar 10, 2026

Uh oh!

tannisroot Mar 10, 2026 Author

Uh oh!

Uh oh!

tannisroot Mar 10, 2026 Author

tannisroot
Mar 8, 2026

Replies: 2 comments 1 reply

aniruddhaadak80
Mar 10, 2026

tannisroot Mar 10, 2026
Author

tannisroot
Mar 10, 2026
Author