Skip to content

a simple llm load balancer. 利用sglang_router的cache-aware 的负载均衡,实现多模型,多节点的负载均衡

Notifications You must be signed in to change notification settings

OneThingAI/simple-loadbalancer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

simple-loadbalancer

Usage

  1. Install dependencies
  2. Run python load_balancer.py
  3. Make sure the endpoints in endpoints_config.yaml start with http://
  4. Make sure the endpoints are running and accessible

Example

start a vllm server on platform onthingai.com then get the endpoint url

then edit the endpoints_config.yaml

Qwen/Qwen2.5-7B-Instruct: 
  - http://your-endpoint-url-here

then run the load_balancer.py

About

a simple llm load balancer. 利用sglang_router的cache-aware 的负载均衡,实现多模型,多节点的负载均衡

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages