Skip to content

GPU users: onnxruntime (CPU) overwrites onnxruntime-gpu binaries when both are installed by pip/uv #608

@michelkluger

Description

@michelkluger

Summary

When fastembed is used in a GPU environment alongside onnxruntime-gpu, the CUDAExecutionProvider silently disappears at runtime because pip/uv end up installing both onnxruntime (CPU) and onnxruntime-gpu simultaneously. Since both packages install into the same site-packages/onnxruntime/ directory, whichever is installed last wins — and in practice the CPU build's onnxruntime_pybind11_state.so overwrites the GPU build's, stripping CUDAExecutionProvider from the available providers list.

Root cause

fastembed declares a hard dependency on onnxruntime > 1.20.0 (by name). Until onnxruntime-gpu ~1.19.x, the GPU wheel declared Provides-Dist: onnxruntime in its metadata, which instructed pip/uv that onnxruntime-gpu satisfies any onnxruntime requirement. This metadata is absent from onnxruntime-gpu >= 1.20.0 (confirmed in 1.24.2). As a result:

  1. uv resolves onnxruntime > 1.20.0 → installs onnxruntime==1.24.2 (CPU build)
  2. User also has onnxruntime-gpu==1.24.2 in their project requirements
  3. Both install to site-packages/onnxruntime/; the CPU onnxruntime_pybind11_state.so overwrites the GPU one
  4. ort.get_available_providers() returns ['CPUExecutionProvider'] instead of ['CUDAExecutionProvider', 'CPUExecutionProvider']

The failure mode is silent — no import error, no warning, just no GPU acceleration.

Minimal repro (Docker)

FROM ghcr.io/astral-sh/uv:python3.13-bookworm-slim
# Project that depends on fastembed + onnxruntime-gpu
RUN uv pip install fastembed onnxruntime-gpu
RUN python -c "import onnxruntime as ort; print(ort.get_available_providers())"
# Output: ['CPUExecutionProvider']  <-- CUDAExecutionProvider is gone

The libonnxruntime_providers_cuda.so is present and all its .so dependencies resolve correctly, but it links against Provider_GetHost from libonnxruntime_providers_shared.so — which is only exported by the GPU pybind11 build, not the CPU one. So dlopen of the CUDA provider fails silently.

Workaround

Reinstall onnxruntime-gpu after uv sync to restore the GPU binaries:

RUN uv sync --frozen --no-dev
RUN uv pip install --python .venv/bin/python --reinstall "onnxruntime-gpu[cuda,cudnn]==1.24.2"

This is fragile (order-dependent, easy to get wrong) and can't be expressed cleanly in pyproject.toml.

Suggested fixes

Option A — fastembed side (preferred): Add a gpu extra that replaces the onnxruntime dep with onnxruntime-gpu:

[project.optional-dependencies]
gpu = ["onnxruntime-gpu"]

[project.dependencies]
# Remove direct onnxruntime pin or make it conditional

And guard the import with try/except so either package works. This lets GPU users do pip install "fastembed[gpu]" and get a coherent environment.

Option B — onnxruntime side: Restore Provides-Dist: onnxruntime in onnxruntime-gpu's wheel metadata so package managers treat them as interchangeable. This was present in onnxruntime-gpu <= 1.19.x. A related issue is tracked at microsoft/onnxruntime#22107.

Environment

  • fastembed==0.7.4, onnxruntime==1.24.2, onnxruntime-gpu==1.24.2
  • Python 3.13, uv 0.6.x
  • Docker image: ghcr.io/astral-sh/uv:python3.13-bookworm-slim
  • Host: NVIDIA RTX 4090 + RTX 5090, driver 570.x, nvidia-container-toolkit 1.17.6

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions