ONNX (and OpenVINO) backends via Sentence Transformers

Hello Weaviate team!

I wanted to share that Sentence Transformers now supports the ONNX and OpenVINO backends. It's rather robust in my opinion, still allowing every kind of pooling (i.e. not just mean pooling like the current ONNX Vectorizer), allowing the use of various different ONNX files if there's multiple (e.g. if they're [optimized or quantized ones](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/tree/main/onnx)), and even automatically exporting the model if there's not already an ONNX (or OpenVINO) file.
Beyond that, if you're on CPU devices, then you may also benefit from the excellent OpenVINO int8 quantization options.

There's documentation on these different backends here: https://sbert.net/docs/sentence_transformer/usage/efficiency.html, including benchmarks at the bottom.

| | |
|-|-|
| ![Image](https://github.com/user-attachments/assets/5f561a29-13b7-4b5e-9571-13075ac16f38) | ![Image](https://github.com/user-attachments/assets/3a2a2949-2939-483d-a4e4-353091455ce4) |

But don't be alarmed, it's extremely simple, just:
```diff
# pip install sentence-transformers[onnx]
from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
	"all-MiniLM-L6-v2",
+	backend="onnx",
)

sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)
```
or
```diff
# pip install sentence-transformers[openvino]
from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
	"all-MiniLM-L6-v2",
+	backend="openvino",
)

sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)
```
Or if there's more model files, you can specify them with the `model_kwargs`
```diff
# pip install sentence-transformers[onnx], or
# pip install sentence-transformers[onnx-gpu]
from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "all-MiniLM-L6-v2",
+   backend="onnx",
+   model_kwargs={"file_name": "onnx/model_qint8_avx512_vnni.onnx"},
)
```

All good if you're not interested in integrating this feature, but I figured it might have gone unnoticed.
In a future version, these ONNX and OpenVINO backends will also be introduced for CrossEncoder/Reranker models.

Full disclosure: I'm the maintainer of Sentence Transformers.

- Tom Aarsen

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ONNX (and OpenVINO) backends via Sentence Transformers #109

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ONNX (and OpenVINO) backends via Sentence Transformers #109

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions