Skip to content

Conversation

@qinxuye
Copy link
Contributor

@qinxuye qinxuye commented Nov 2, 2025

Before this PR, embedding is sequential, though users could create multiple embeddings at same time.

After this PR, the model class could inherit BatchMixin, and provides method with batch version (utilizing xoscar batch API), the request will be put into queue, and a background coroutine will collect items as many as possible and call API in a single call.

This is an initial version of auto batching, actually, this could be applied to all models, not only for auto-regressive models(basically LLM).

Fixes #4123

@XprobeBot XprobeBot added this to the v1.x milestone Nov 2, 2025
@qinxuye
Copy link
Contributor Author

qinxuye commented Nov 2, 2025

@llyycchhee please help me check this PR, and see if there's anything that can be improved.

Benchmarks welcome.

@qinxuye
Copy link
Contributor Author

qinxuye commented Nov 10, 2025

Can you paste some benchmark to illustrate the result? @llyycchhee

@qinxuye
Copy link
Contributor Author

qinxuye commented Nov 12, 2025

image

256 queries' benchmark.

The improvements are huge.

Copy link
Collaborator

@llyycchhee llyycchhee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qinxuye qinxuye merged commit c550277 into xorbitsai:main Nov 12, 2025
11 of 14 checks passed
@qinxuye qinxuye deleted the feat/batch branch November 12, 2025 11:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

嵌入模型并发性能问题

3 participants