Skip to content

ONNX (and OpenVINO) backends via Sentence Transformers #109

@tomaarsen

Description

@tomaarsen

Hello Weaviate team!

I wanted to share that Sentence Transformers now supports the ONNX and OpenVINO backends. It's rather robust in my opinion, still allowing every kind of pooling (i.e. not just mean pooling like the current ONNX Vectorizer), allowing the use of various different ONNX files if there's multiple (e.g. if they're optimized or quantized ones), and even automatically exporting the model if there's not already an ONNX (or OpenVINO) file.
Beyond that, if you're on CPU devices, then you may also benefit from the excellent OpenVINO int8 quantization options.

There's documentation on these different backends here: https://sbert.net/docs/sentence_transformer/usage/efficiency.html, including benchmarks at the bottom.

Image Image

But don't be alarmed, it's extremely simple, just:

# pip install sentence-transformers[onnx]
from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
	"all-MiniLM-L6-v2",
+	backend="onnx",
)

sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)

or

# pip install sentence-transformers[openvino]
from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
	"all-MiniLM-L6-v2",
+	backend="openvino",
)

sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)

Or if there's more model files, you can specify them with the model_kwargs

# pip install sentence-transformers[onnx], or
# pip install sentence-transformers[onnx-gpu]
from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "all-MiniLM-L6-v2",
+   backend="onnx",
+   model_kwargs={"file_name": "onnx/model_qint8_avx512_vnni.onnx"},
)

All good if you're not interested in integrating this feature, but I figured it might have gone unnoticed.
In a future version, these ONNX and OpenVINO backends will also be introduced for CrossEncoder/Reranker models.

Full disclosure: I'm the maintainer of Sentence Transformers.

  • Tom Aarsen

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions