-
Notifications
You must be signed in to change notification settings - Fork 193
GPU users: onnxruntime (CPU) overwrites onnxruntime-gpu binaries when both are installed by pip/uv #608
Description
Summary
When fastembed is used in a GPU environment alongside onnxruntime-gpu, the CUDAExecutionProvider silently disappears at runtime because pip/uv end up installing both onnxruntime (CPU) and onnxruntime-gpu simultaneously. Since both packages install into the same site-packages/onnxruntime/ directory, whichever is installed last wins — and in practice the CPU build's onnxruntime_pybind11_state.so overwrites the GPU build's, stripping CUDAExecutionProvider from the available providers list.
Root cause
fastembed declares a hard dependency on onnxruntime > 1.20.0 (by name). Until onnxruntime-gpu ~1.19.x, the GPU wheel declared Provides-Dist: onnxruntime in its metadata, which instructed pip/uv that onnxruntime-gpu satisfies any onnxruntime requirement. This metadata is absent from onnxruntime-gpu >= 1.20.0 (confirmed in 1.24.2). As a result:
- uv resolves
onnxruntime > 1.20.0→ installsonnxruntime==1.24.2(CPU build) - User also has
onnxruntime-gpu==1.24.2in their project requirements - Both install to
site-packages/onnxruntime/; the CPUonnxruntime_pybind11_state.sooverwrites the GPU one ort.get_available_providers()returns['CPUExecutionProvider']instead of['CUDAExecutionProvider', 'CPUExecutionProvider']
The failure mode is silent — no import error, no warning, just no GPU acceleration.
Minimal repro (Docker)
FROM ghcr.io/astral-sh/uv:python3.13-bookworm-slim
# Project that depends on fastembed + onnxruntime-gpu
RUN uv pip install fastembed onnxruntime-gpu
RUN python -c "import onnxruntime as ort; print(ort.get_available_providers())"
# Output: ['CPUExecutionProvider'] <-- CUDAExecutionProvider is goneThe libonnxruntime_providers_cuda.so is present and all its .so dependencies resolve correctly, but it links against Provider_GetHost from libonnxruntime_providers_shared.so — which is only exported by the GPU pybind11 build, not the CPU one. So dlopen of the CUDA provider fails silently.
Workaround
Reinstall onnxruntime-gpu after uv sync to restore the GPU binaries:
RUN uv sync --frozen --no-dev
RUN uv pip install --python .venv/bin/python --reinstall "onnxruntime-gpu[cuda,cudnn]==1.24.2"This is fragile (order-dependent, easy to get wrong) and can't be expressed cleanly in pyproject.toml.
Suggested fixes
Option A — fastembed side (preferred): Add a gpu extra that replaces the onnxruntime dep with onnxruntime-gpu:
[project.optional-dependencies]
gpu = ["onnxruntime-gpu"]
[project.dependencies]
# Remove direct onnxruntime pin or make it conditionalAnd guard the import with try/except so either package works. This lets GPU users do pip install "fastembed[gpu]" and get a coherent environment.
Option B — onnxruntime side: Restore Provides-Dist: onnxruntime in onnxruntime-gpu's wheel metadata so package managers treat them as interchangeable. This was present in onnxruntime-gpu <= 1.19.x. A related issue is tracked at microsoft/onnxruntime#22107.
Environment
fastembed==0.7.4,onnxruntime==1.24.2,onnxruntime-gpu==1.24.2- Python 3.13, uv 0.6.x
- Docker image:
ghcr.io/astral-sh/uv:python3.13-bookworm-slim - Host: NVIDIA RTX 4090 + RTX 5090, driver 570.x, nvidia-container-toolkit 1.17.6