- 
                Notifications
    You must be signed in to change notification settings 
- Fork 320
Open
Description
System Info
GPU being used:
Fri Oct 24 20:39:42 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.02              Driver Version: 581.42         CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1650        On  |   00000000:0A:00.0  On |                  N/A |
|  0%   45C    P8            N/A  /   45W |     842MiB /   4096MiB |      2%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A              30      G   /Xwayland                             N/A      |
|    0   N/A  N/A              33      G   /Xwayland                             N/A      |
+-----------------------------------------------------------------------------------------+
And my OS
$ neofetch
            .-/+oossssoo+/-.               nemo@DESKTOP-ICS0GDD 
        `:+ssssssssssssssssss+:`           -------------------- 
      -+ssssssssssssssssssyyssss+-         OS: Ubuntu 22.04.5 LTS on Windows 10 x86_64 
    .ossssssssssssssssssdMMMNysssso.       Kernel: 6.6.87.2-microsoft-standard-WSL2 
   /ssssssssssshdmmNNmmyNMMMMhssssss/      Uptime: 1 hour, 8 mins 
  +ssssssssshmydMMMMMMMNddddyssssssss+     Packages: 704 (dpkg), 6 (snap) 
 /sssssssshNMMMyhhyyyyhmNMMMNhssssssss/    Shell: bash 5.1.16 
.ssssssssdMMMNhsssssssssshNMMMdssssssss.   Theme: Adwaita [GTK3] 
+sssshhhyNMMNyssssssssssssyNMMMysssssss+   Icons: Adwaita [GTK3] 
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   Terminal: vscode 
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   CPU: AMD Ryzen 5 3600 (12) @ 3.593GHz 
+sssshhhyNMMNyssssssssssssyNMMMysssssss+   GPU: d653:00:00.0 Microsoft Corporation Device 008e 
.ssssssssdMMMNhsssssssssshNMMMdssssssss.   Memory: 1239MiB / 7913MiB 
 /sssssssshNMMMyhhyyyyhdNMMMNhssssssss/
  +sssssssssdmydMMMMMMMMddddyssssssss+                             
   /ssssssssssshdmNNNNmyNMMMMhssssss/                              
    .ossssssssssssssssssdMMMNysssso.
      -+sssssssssssssssssyyyssss+-
        `:+ssssssssssssssssss+:`
            .-/+oossssoo+/-.
And docker version
$ docker version
Client:
 Version:           28.3.2
 API version:       1.51
 Go version:        go1.24.5
 Git commit:        578ccf6
 Built:             Wed Jul  9 16:12:50 2025
 OS/Arch:           linux/amd64
 Context:           default
Server: Docker Desktop 4.44.2 (202017)
 Engine:
  Version:          28.3.2
  API version:      1.51 (minimum version 1.24)
  Go version:       go1.24.5
  Git commit:       e77ff99
  Built:            Wed Jul  9 16:13:55 2025
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.7.27
  GitCommit:        05044ec0a9a75232cad458027ca83437aae3f4da
 runc:
  Version:          1.2.5
  GitCommit:        v1.2.5-0-g59923ef
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
Information
- Docker
- The CLI directly
Tasks
- An officially supported command
- My own modifications
Reproduction
I'm trying to run this localy inside WSL2. I use this command to launch the server and this is the output
$ make start-embed-server 
docker run \
        --gpus all \
        -p 8080:80 --name embed \
        --env RUST_BACKTRACE=full \
        -v "./embeddings/data:/data" --pull always --rm \
        ghcr.io/huggingface/text-embeddings-inference:turing-1.8.2 \
        --model-id Qwen/Qwen3-Embedding-0.6B \
        --max-batch-tokens 1024 \
        --max-concurrent-requests 1 \
        --max-client-batch-size 1
turing-1.8.2: Pulling from huggingface/text-embeddings-inference
Digest: sha256:600c06ef2ea5ee804a6cd656fe357aa8bf0977cdff7271756b6536e98912c589
Status: Image is up to date for ghcr.io/huggingface/text-embeddings-inference:turing-1.8.2
2025-10-24T18:40:32.279522Z  INFO text_embeddings_router: router/src/main.rs:203: Args { model_id: "Qwe*/*****-*********-0.6B", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 1, max_batch_tokens: 1024, max_batch_requests: None, max_client_batch_size: 1, auto_truncate: false, default_prompt_name: None, default_prompt: None, dense_path: None, hf_api_token: None, hf_token: None, hostname: "827e7e97024c", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, disable_spans: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", prometheus_port: 9000, cors_allow_origin: None }
2025-10-24T18:40:32.363020Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:42: Starting download
2025-10-24T18:40:32.363055Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `1_Pooling/config.json`
2025-10-24T18:40:32.363115Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_bert_config.json`
2025-10-24T18:40:32.540194Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_roberta_config.json`
2025-10-24T18:40:32.694410Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_distilbert_config.json`
2025-10-24T18:40:32.821505Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_camembert_config.json`
2025-10-24T18:40:32.952155Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_albert_config.json`
2025-10-24T18:40:33.079634Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_xlm-roberta_config.json`
2025-10-24T18:40:33.207285Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_xlnet_config.json`
2025-10-24T18:40:33.331654Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `config_sentence_transformers.json`
2025-10-24T18:40:33.331741Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `config.json`
2025-10-24T18:40:33.331760Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `tokenizer.json`
2025-10-24T18:40:33.331785Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:72: Model artifacts downloaded in 968.769707ms
2025-10-24T18:40:33.848108Z  WARN text_embeddings_router: router/src/lib.rs:190: Could not find a Sentence Transformers config
2025-10-24T18:40:33.848147Z  INFO text_embeddings_router: router/src/lib.rs:194: Maximum number of tokens per request: 32768
2025-10-24T18:40:33.848406Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:38: Starting 12 tokenization workers
2025-10-24T18:40:34.616653Z  INFO text_embeddings_router: router/src/lib.rs:242: Starting model backend
2025-10-24T18:40:34.618113Z  INFO text_embeddings_backend: backends/src/lib.rs:553: Downloading `model.safetensors`
2025-10-24T18:40:34.618574Z  INFO text_embeddings_backend: backends/src/lib.rs:421: Model weights downloaded in 464.065µs
2025-10-24T18:40:34.618629Z  INFO download_dense_modules: text_embeddings_backend: backends/src/lib.rs:652: Downloading `modules.json`
2025-10-24T18:40:34.618755Z  INFO text_embeddings_backend: backends/src/lib.rs:433: Dense modules downloaded in 145.581µs
2025-10-24T18:40:35.601738Z  INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:503: Starting Qwen3 model on Cuda(CudaDevice(DeviceId(1)))
2025-10-24T18:40:44.488836Z  INFO text_embeddings_router: router/src/lib.rs:260: Warming up model
2025-10-24T18:40:48.369640Z  WARN text_embeddings_router: router/src/lib.rs:319: Invalid hostname, defaulting to 0.0.0.0
2025-10-24T18:40:48.371041Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1852: Starting HTTP server: 0.0.0.0:80
2025-10-24T18:40:48.371065Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1853: Ready
From another console
$ curl http://localhost:8080/info
{"model_id":"Qwen/Qwen3-Embedding-0.6B","model_sha":null,"model_dtype":"float16","model_type":{"embedding":{"pooling":"last_token"}},"max_concurrent_requests":1,"max_input_length":32768,"max_batch_tokens":1024,"max_batch_requests":null,"max_client_batch_size":1,"auto_truncate":false,"tokenization_workers":12,"version":"1.8.2","sha":"d7af1fcc509902d8cc66cebf5a61c5e8e000e442","docker_label":"sha-d7af1fc"}
The issue happens when I do
$ curl http://localhost:8080/embed -X POST -H 'Content-Type: application/json' -d '{"inputs": ["hel
lo world"]}'
curl: (52) Empty reply from server
And the console from the container shows
thread 'tokio-runtime-worker' panicked at core/src/queue.rs:87:14:
Queue background task dropped the receiver or the receiver is too behind. This is a bug.: "Full(..)"
stack backtrace:
   0:     0x623ba0dc2eff - <unknown>
   1:     0x623ba09afe33 - <unknown>
   2:     0x623ba0dc2652 - <unknown>
   3:     0x623ba0dc2d63 - <unknown>
   4:     0x623ba0dc2427 - <unknown>
   5:     0x623ba0dfef18 - <unknown>
   6:     0x623ba0dfee79 - <unknown>
   7:     0x623ba0dff31c - <unknown>
   8:     0x623ba0210bef - <unknown>
   9:     0x623ba0210f95 - <unknown>
  10:     0x623ba0f56d5d - <unknown>
  11:     0x623ba0f560f4 - <unknown>
  12:     0x623ba0f557fd - <unknown>
  13:     0x623ba0f54afa - <unknown>
  14:     0x623ba103c06e - <unknown>
  15:     0x623ba103f230 - <unknown>
  16:     0x623ba102fbf4 - <unknown>
  17:     0x623ba1033b0b - <unknown>
  18:     0x623ba0e0002b - <unknown>
  19:     0x74a23727dac3 - <unknown>
  20:     0x74a23730ebf4 - clone
  21:                0x0 - <unknown>
make: *** [Makefile:5: start-embed-server] Error 139
Expected behavior
It should be able to work and not crash. If I run a python 3.13 container also with --gpus all and install sentence-transformers, I'm able to use the model:
root@190c68a05b95:/# python3
Python 3.13.9 (main, Oct 21 2025, 11:49:28) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from sentence_transformers import SentenceTransformer
>>> model = SentenceTransformer("Qwen/Qwen3-Embedding-0.6B")
>>> print(model.encode(["Hello World"]))
[[ 0.00581264 -0.00302087 -0.01198636 ...  0.00724208 -0.00435534
   0.00336421]]
Metadata
Metadata
Assignees
Labels
No labels