feat: OpenAI-compatible API Endpoints for Embedding Models #104

yinggeh · 2025-10-30T01:08:11Z

Enable vLLM to load embedding model and execute embedding requests

src/utils/request.py

…end into yinggeh/tri-49-request-for-openai-compatible-api-endpoints-for-triton

whoisj

left a few comments.

whoisj · 2025-10-30T21:21:36Z

src/utils/request.py

+            pooling_params = PoolingParams(dimensions=dims, task="embed")
+        return pooling_params
+
+    def create_response(self, request_output):


would be nice to have a type hint on request_output.

whoisj · 2025-10-30T21:22:15Z

src/utils/request.py

+        async for response in response_iterator:
+            yield response
+
+    def create_response(self, request_output_state, request_output, prepend_input):


would be nice to have type hints on request_output_state, request_output, and prepend_input.

whoisj · 2025-10-30T21:22:58Z

src/utils/request.py

+
+
+class RequestBase:
+    def __init__(self, request, executor_callback, output_dtype):


would be nice to have type hints on request, executor_callback, and output_dtype.

whoisj · 2025-10-30T21:24:18Z

src/utils/request.py

+from abc import abstractmethod
+from io import BytesIO
+
+import numpy as np


is using numpy (CPU) good enough?

do we want to leverage cupy (GPU)?

My understanding is that vLLM engine takes care of it.

yinggeh requested review from oandreeva-nv, pskiran1 and whoisj October 30, 2025 01:08

yinggeh self-assigned this Oct 30, 2025

yinggeh added the enhancement New feature or request label Oct 30, 2025

yinggeh mentioned this pull request Oct 30, 2025

feat: OpenAI-compatible API Endpoints for Embedding Models triton-inference-server/server#8483

Open

11 tasks

github-advanced-security bot found potential problems Oct 30, 2025

View reviewed changes

src/utils/request.py Fixed Show fixed Hide fixed

yinggeh force-pushed the yinggeh/tri-49-request-for-openai-compatible-api-endpoints-for-triton branch from 0acb76f to 7d06043 Compare October 30, 2025 01:12

Support embedding endpoint in OpenAI API frontend

2c3e148

yinggeh force-pushed the yinggeh/tri-49-request-for-openai-compatible-api-endpoints-for-triton branch from 7d06043 to 2c3e148 Compare October 30, 2025 01:14

Merge branch 'r25.10' of github.com:triton-inference-server/vllm_back…

961a7c3

…end into yinggeh/tri-49-request-for-openai-compatible-api-endpoints-for-triton

whoisj reviewed Oct 30, 2025

View reviewed changes

Address comment and rebase to r25.10 (V1 API)

943ee5f

yinggeh changed the base branch from main to r25.10 October 30, 2025 22:46

Add warning

6bd56c4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: OpenAI-compatible API Endpoints for Embedding Models #104

feat: OpenAI-compatible API Endpoints for Embedding Models #104

Uh oh!

yinggeh commented Oct 30, 2025

Uh oh!

Uh oh!

whoisj left a comment

Uh oh!

whoisj Oct 30, 2025

Uh oh!

yinggeh Oct 30, 2025

Uh oh!

whoisj Oct 30, 2025

Uh oh!

yinggeh Oct 30, 2025

Uh oh!

whoisj Oct 30, 2025

Uh oh!

yinggeh Oct 30, 2025

Uh oh!

whoisj Oct 30, 2025

Uh oh!

yinggeh Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		class RequestBase:
		def __init__(self, request, executor_callback, output_dtype):

feat: OpenAI-compatible API Endpoints for Embedding Models #104

Are you sure you want to change the base?

feat: OpenAI-compatible API Endpoints for Embedding Models #104

Uh oh!

Conversation

yinggeh commented Oct 30, 2025

Uh oh!

Uh oh!

whoisj left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants