- Install dependencies
- Run
python load_balancer.py - Make sure the endpoints in
endpoints_config.yamlstart withhttp:// - Make sure the endpoints are running and accessible
start a vllm server on platform onthingai.com then get the endpoint url
then edit the endpoints_config.yaml
Qwen/Qwen2.5-7B-Instruct:
- http://your-endpoint-url-here
then run the load_balancer.py