Skip to content

Bug Report: Cannot embed text Queue background task dropped the receiver or the receiver is too behind. This is a bug.: "Full(..)" #744

@MrNemo64

Description

@MrNemo64

System Info

GPU being used:

Fri Oct 24 20:39:42 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.02              Driver Version: 581.42         CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1650        On  |   00000000:0A:00.0  On |                  N/A |
|  0%   45C    P8            N/A  /   45W |     842MiB /   4096MiB |      2%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A              30      G   /Xwayland                             N/A      |
|    0   N/A  N/A              33      G   /Xwayland                             N/A      |
+-----------------------------------------------------------------------------------------+

And my OS

$ neofetch
            .-/+oossssoo+/-.               nemo@DESKTOP-ICS0GDD 
        `:+ssssssssssssssssss+:`           -------------------- 
      -+ssssssssssssssssssyyssss+-         OS: Ubuntu 22.04.5 LTS on Windows 10 x86_64 
    .ossssssssssssssssssdMMMNysssso.       Kernel: 6.6.87.2-microsoft-standard-WSL2 
   /ssssssssssshdmmNNmmyNMMMMhssssss/      Uptime: 1 hour, 8 mins 
  +ssssssssshmydMMMMMMMNddddyssssssss+     Packages: 704 (dpkg), 6 (snap) 
 /sssssssshNMMMyhhyyyyhmNMMMNhssssssss/    Shell: bash 5.1.16 
.ssssssssdMMMNhsssssssssshNMMMdssssssss.   Theme: Adwaita [GTK3] 
+sssshhhyNMMNyssssssssssssyNMMMysssssss+   Icons: Adwaita [GTK3] 
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   Terminal: vscode 
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   CPU: AMD Ryzen 5 3600 (12) @ 3.593GHz 
+sssshhhyNMMNyssssssssssssyNMMMysssssss+   GPU: d653:00:00.0 Microsoft Corporation Device 008e 
.ssssssssdMMMNhsssssssssshNMMMdssssssss.   Memory: 1239MiB / 7913MiB 
 /sssssssshNMMMyhhyyyyhdNMMMNhssssssss/
  +sssssssssdmydMMMMMMMMddddyssssssss+                             
   /ssssssssssshdmNNNNmyNMMMMhssssss/                              
    .ossssssssssssssssssdMMMNysssso.
      -+sssssssssssssssssyyyssss+-
        `:+ssssssssssssssssss+:`
            .-/+oossssoo+/-.

And docker version

$ docker version
Client:
 Version:           28.3.2
 API version:       1.51
 Go version:        go1.24.5
 Git commit:        578ccf6
 Built:             Wed Jul  9 16:12:50 2025
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Desktop 4.44.2 (202017)
 Engine:
  Version:          28.3.2
  API version:      1.51 (minimum version 1.24)
  Go version:       go1.24.5
  Git commit:       e77ff99
  Built:            Wed Jul  9 16:13:55 2025
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.7.27
  GitCommit:        05044ec0a9a75232cad458027ca83437aae3f4da
 runc:
  Version:          1.2.5
  GitCommit:        v1.2.5-0-g59923ef
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

I'm trying to run this localy inside WSL2. I use this command to launch the server and this is the output

$ make start-embed-server 
docker run \
        --gpus all \
        -p 8080:80 --name embed \
        --env RUST_BACKTRACE=full \
        -v "./embeddings/data:/data" --pull always --rm \
        ghcr.io/huggingface/text-embeddings-inference:turing-1.8.2 \
        --model-id Qwen/Qwen3-Embedding-0.6B \
        --max-batch-tokens 1024 \
        --max-concurrent-requests 1 \
        --max-client-batch-size 1
turing-1.8.2: Pulling from huggingface/text-embeddings-inference
Digest: sha256:600c06ef2ea5ee804a6cd656fe357aa8bf0977cdff7271756b6536e98912c589
Status: Image is up to date for ghcr.io/huggingface/text-embeddings-inference:turing-1.8.2
2025-10-24T18:40:32.279522Z  INFO text_embeddings_router: router/src/main.rs:203: Args { model_id: "Qwe*/*****-*********-0.6B", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 1, max_batch_tokens: 1024, max_batch_requests: None, max_client_batch_size: 1, auto_truncate: false, default_prompt_name: None, default_prompt: None, dense_path: None, hf_api_token: None, hf_token: None, hostname: "827e7e97024c", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, disable_spans: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", prometheus_port: 9000, cors_allow_origin: None }
2025-10-24T18:40:32.363020Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:42: Starting download
2025-10-24T18:40:32.363055Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `1_Pooling/config.json`
2025-10-24T18:40:32.363115Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_bert_config.json`
2025-10-24T18:40:32.540194Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_roberta_config.json`
2025-10-24T18:40:32.694410Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_distilbert_config.json`
2025-10-24T18:40:32.821505Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_camembert_config.json`
2025-10-24T18:40:32.952155Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_albert_config.json`
2025-10-24T18:40:33.079634Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_xlm-roberta_config.json`
2025-10-24T18:40:33.207285Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_xlnet_config.json`
2025-10-24T18:40:33.331654Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `config_sentence_transformers.json`
2025-10-24T18:40:33.331741Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `config.json`
2025-10-24T18:40:33.331760Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `tokenizer.json`
2025-10-24T18:40:33.331785Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:72: Model artifacts downloaded in 968.769707ms
2025-10-24T18:40:33.848108Z  WARN text_embeddings_router: router/src/lib.rs:190: Could not find a Sentence Transformers config
2025-10-24T18:40:33.848147Z  INFO text_embeddings_router: router/src/lib.rs:194: Maximum number of tokens per request: 32768
2025-10-24T18:40:33.848406Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:38: Starting 12 tokenization workers
2025-10-24T18:40:34.616653Z  INFO text_embeddings_router: router/src/lib.rs:242: Starting model backend
2025-10-24T18:40:34.618113Z  INFO text_embeddings_backend: backends/src/lib.rs:553: Downloading `model.safetensors`
2025-10-24T18:40:34.618574Z  INFO text_embeddings_backend: backends/src/lib.rs:421: Model weights downloaded in 464.065µs
2025-10-24T18:40:34.618629Z  INFO download_dense_modules: text_embeddings_backend: backends/src/lib.rs:652: Downloading `modules.json`
2025-10-24T18:40:34.618755Z  INFO text_embeddings_backend: backends/src/lib.rs:433: Dense modules downloaded in 145.581µs
2025-10-24T18:40:35.601738Z  INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:503: Starting Qwen3 model on Cuda(CudaDevice(DeviceId(1)))
2025-10-24T18:40:44.488836Z  INFO text_embeddings_router: router/src/lib.rs:260: Warming up model
2025-10-24T18:40:48.369640Z  WARN text_embeddings_router: router/src/lib.rs:319: Invalid hostname, defaulting to 0.0.0.0
2025-10-24T18:40:48.371041Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1852: Starting HTTP server: 0.0.0.0:80
2025-10-24T18:40:48.371065Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1853: Ready

From another console

$ curl http://localhost:8080/info
{"model_id":"Qwen/Qwen3-Embedding-0.6B","model_sha":null,"model_dtype":"float16","model_type":{"embedding":{"pooling":"last_token"}},"max_concurrent_requests":1,"max_input_length":32768,"max_batch_tokens":1024,"max_batch_requests":null,"max_client_batch_size":1,"auto_truncate":false,"tokenization_workers":12,"version":"1.8.2","sha":"d7af1fcc509902d8cc66cebf5a61c5e8e000e442","docker_label":"sha-d7af1fc"}

The issue happens when I do

$ curl http://localhost:8080/embed -X POST -H 'Content-Type: application/json' -d '{"inputs": ["hel
lo world"]}'
curl: (52) Empty reply from server

And the console from the container shows

thread 'tokio-runtime-worker' panicked at core/src/queue.rs:87:14:
Queue background task dropped the receiver or the receiver is too behind. This is a bug.: "Full(..)"
stack backtrace:
   0:     0x623ba0dc2eff - <unknown>
   1:     0x623ba09afe33 - <unknown>
   2:     0x623ba0dc2652 - <unknown>
   3:     0x623ba0dc2d63 - <unknown>
   4:     0x623ba0dc2427 - <unknown>
   5:     0x623ba0dfef18 - <unknown>
   6:     0x623ba0dfee79 - <unknown>
   7:     0x623ba0dff31c - <unknown>
   8:     0x623ba0210bef - <unknown>
   9:     0x623ba0210f95 - <unknown>
  10:     0x623ba0f56d5d - <unknown>
  11:     0x623ba0f560f4 - <unknown>
  12:     0x623ba0f557fd - <unknown>
  13:     0x623ba0f54afa - <unknown>
  14:     0x623ba103c06e - <unknown>
  15:     0x623ba103f230 - <unknown>
  16:     0x623ba102fbf4 - <unknown>
  17:     0x623ba1033b0b - <unknown>
  18:     0x623ba0e0002b - <unknown>
  19:     0x74a23727dac3 - <unknown>
  20:     0x74a23730ebf4 - clone
  21:                0x0 - <unknown>
make: *** [Makefile:5: start-embed-server] Error 139

Expected behavior

It should be able to work and not crash. If I run a python 3.13 container also with --gpus all and install sentence-transformers, I'm able to use the model:

root@190c68a05b95:/# python3
Python 3.13.9 (main, Oct 21 2025, 11:49:28) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from sentence_transformers import SentenceTransformer
>>> model = SentenceTransformer("Qwen/Qwen3-Embedding-0.6B")
>>> print(model.encode(["Hello World"]))
[[ 0.00581264 -0.00302087 -0.01198636 ...  0.00724208 -0.00435534
   0.00336421]]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions