Skip to content

Conversation

@yinggeh
Copy link
Contributor

@yinggeh yinggeh commented Oct 30, 2025

Enable vLLM to load embedding model and execute embedding requests

@yinggeh yinggeh force-pushed the yinggeh/tri-49-request-for-openai-compatible-api-endpoints-for-triton branch from 0acb76f to 7d06043 Compare October 30, 2025 01:12
@yinggeh yinggeh force-pushed the yinggeh/tri-49-request-for-openai-compatible-api-endpoints-for-triton branch from 7d06043 to 2c3e148 Compare October 30, 2025 01:14
…end into yinggeh/tri-49-request-for-openai-compatible-api-endpoints-for-triton
Copy link

@whoisj whoisj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a few comments.

pooling_params = PoolingParams(dimensions=dims, task="embed")
return pooling_params

def create_response(self, request_output):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to have a type hint on request_output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

async for response in response_iterator:
yield response

def create_response(self, request_output_state, request_output, prepend_input):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to have type hints on request_output_state, request_output, and prepend_input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed



class RequestBase:
def __init__(self, request, executor_callback, output_dtype):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to have type hints on request, executor_callback, and output_dtype.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

from abc import abstractmethod
from io import BytesIO

import numpy as np
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is using numpy (CPU) good enough?

do we want to leverage cupy (GPU)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that vLLM engine takes care of it.

@yinggeh yinggeh changed the base branch from main to r25.10 October 30, 2025 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants