-
Notifications
You must be signed in to change notification settings - Fork 655
[Feature] [PD] add simple router and refine splitwise deployment #4709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,4 +5,4 @@ metadata: | |
| max_tokens: 32768 | ||
| repetition_penalty: 1.05 | ||
| frequency_penalty: 0 | ||
| presence_penalty: 0 | ||
| presence_penalty: 0 | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,71 @@ | ||
| #!/bin/bash | ||
| set -e | ||
|
|
||
| wait_for_health() { | ||
| local server_port=$1 | ||
| while true; do | ||
| status_code=$(curl -s -o /dev/null -w "%{http_code}" "http://0.0.0.0:${server_port}/health" || echo "000") | ||
| if [ "$status_code" -eq 200 ]; then | ||
| break | ||
| else | ||
| echo "Service not ready. Retrying in 2s..." | ||
| sleep 2 | ||
| fi | ||
| done | ||
| } | ||
|
|
||
| # prepare environment | ||
| MODEL_NAME="PaddlePaddle/ERNIE-4.5-0.3B-Paddle" | ||
| # MODEL_NAME="baidu/ERNIE-4.5-21B-A3B-Paddle" | ||
|
|
||
| export FD_DEBUG=1 | ||
| export ENABLE_V1_KVCACHE_SCHEDULER=0 | ||
| export KVCACHE_GDRCOPY_FLUSH_ENABLE=1 | ||
|
|
||
| unset http_proxy && unset https_proxy | ||
| rm -rf log_* | ||
|
|
||
| # start router | ||
| export FD_LOG_DIR="log_router" | ||
| mkdir -p ${FD_LOG_DIR} | ||
|
|
||
| router_port=9000 | ||
| nohup python -m fastdeploy.router.launch \ | ||
| --port ${router_port} \ | ||
| 2>&1 >${FD_LOG_DIR}/nohup & | ||
| sleep 1 | ||
|
|
||
| # start modelserver 0 | ||
| export CUDA_VISIBLE_DEVICES=0 | ||
| export FD_LOG_DIR="log_server_0" | ||
| mkdir -p ${FD_LOG_DIR} | ||
|
|
||
| nohup python -m fastdeploy.entrypoints.openai.api_server \ | ||
| --model ${MODEL_NAME} \ | ||
| --port 8100 \ | ||
| --metrics-port 8101 \ | ||
| --engine-worker-queue-port 8102 \ | ||
| --cache-queue-port 8103 \ | ||
| --max-model-len 32768 \ | ||
| --router "0.0.0.0:${router_port}" \ | ||
| 2>&1 >${FD_LOG_DIR}/nohup & | ||
| sleep 1 | ||
|
|
||
| wait_for_health 8100 | ||
|
|
||
| # start modelserver 1 | ||
| export CUDA_VISIBLE_DEVICES=1 | ||
| export FD_LOG_DIR="log_server_1" | ||
| mkdir -p ${FD_LOG_DIR} | ||
|
|
||
| nohup python -m fastdeploy.entrypoints.openai.api_server \ | ||
| --model ${MODEL_NAME} \ | ||
| --port 8200 \ | ||
| --metrics-port 8201 \ | ||
| --engine-worker-queue-port 8202 \ | ||
| --cache-queue-port 8203 \ | ||
| --max-model-len 32768 \ | ||
| --router "0.0.0.0:${router_port}" \ | ||
| 2>&1 >${FD_LOG_DIR}/nohup & | ||
|
|
||
| wait_for_health 8200 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,66 @@ | ||
| #!/bin/bash | ||
| set -e | ||
|
|
||
| # Test splitwise deployment | ||
| # v0 requires prefill and decode in one node and it uses local scheduler | ||
| # v1 supports prefill and decode in multi node and it uses splitwise scheduler | ||
| # v2 supports prefill and decode in multi node and it uses router and local scheduler | ||
|
|
||
| wait_for_health() { | ||
| local server_port=$1 | ||
| while true; do | ||
| status_code=$(curl -s -o /dev/null -w "%{http_code}" "http://0.0.0.0:${server_port}/health" || echo "000") | ||
| if [ "$status_code" -eq 200 ]; then | ||
| break | ||
| else | ||
| echo "Service not ready. Retrying in 2s..." | ||
| sleep 2 | ||
| fi | ||
| done | ||
| } | ||
|
|
||
| MODEL_NAME="PaddlePaddle/ERNIE-4.5-0.3B-Paddle" | ||
| # MODEL_NAME="baidu/ERNIE-4.5-21B-A3B-Paddle" | ||
| aistudio download --model ${MODEL_NAME} | ||
|
|
||
| unset http_proxy && unset https_proxy | ||
| rm -rf log_* | ||
|
|
||
| # start prefill | ||
| export FD_LOG_DIR="log_prefill" | ||
| mkdir -p ${FD_LOG_DIR} | ||
|
|
||
| export CUDA_VISIBLE_DEVICES=0 | ||
| export FD_DEBUG=1 | ||
| export ENABLE_V1_KVCACHE_SCHEDULER=0 | ||
|
|
||
| nohup python -m fastdeploy.entrypoints.openai.api_server \ | ||
| --model ${MODEL_NAME} \ | ||
| --port 8100 \ | ||
| --metrics-port 8101 \ | ||
| --engine-worker-queue-port 8102 \ | ||
| --cache-queue-port 8103 \ | ||
| --max-model-len 32768 \ | ||
| --splitwise-role "prefill" \ | ||
| 2>&1 >${FD_LOG_DIR}/nohup & | ||
| wait_for_health 8100 | ||
|
|
||
| # start decode | ||
| export FD_LOG_DIR="log_decode" | ||
| mkdir -p ${FD_LOG_DIR} | ||
|
|
||
| export CUDA_VISIBLE_DEVICES=1 | ||
| export FD_DEBUG=1 | ||
| export ENABLE_V1_KVCACHE_SCHEDULER=0 | ||
|
|
||
| nohup python -m fastdeploy.entrypoints.openai.api_server \ | ||
| --model ${MODEL_NAME} \ | ||
| --port 9000 \ | ||
| --metrics-port 9001 \ | ||
| --engine-worker-queue-port 9002 \ | ||
| --cache-queue-port 9003 \ | ||
| --max-model-len 32768 \ | ||
| --splitwise-role "decode" \ | ||
| --innode-prefill-ports 8102 \ | ||
| 2>&1 >${FD_LOG_DIR}/nohup & | ||
| wait_for_health 9000 | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,96 @@ | ||
| #!/bin/bash | ||
| set -e | ||
|
|
||
| # Test splitwise deployment | ||
| # v0 requires prefill and decode in one node and it uses local scheduler | ||
| # v1 supports prefill and decode in multi node and it uses splitwise scheduler | ||
| # v2 supports prefill and decode in multi node and it uses router and local scheduler | ||
|
|
||
| wait_for_health() { | ||
| local server_port=$1 | ||
| while true; do | ||
| status_code=$(curl -s -o /dev/null -w "%{http_code}" "http://0.0.0.0:${server_port}/health" || echo "000") | ||
| if [ "$status_code" -eq 200 ]; then | ||
| break | ||
| else | ||
| echo "Service not ready. Retrying in 2s..." | ||
| sleep 2 | ||
| fi | ||
| done | ||
| } | ||
|
|
||
| # prepare environment | ||
| MODEL_NAME="PaddlePaddle/ERNIE-4.5-0.3B-Paddle" | ||
| # MODEL_NAME="baidu/ERNIE-4.5-21B-A3B-Paddle" | ||
|
|
||
| export FD_DEBUG=1 | ||
| export ENABLE_V1_KVCACHE_SCHEDULER=0 | ||
| export KVCACHE_GDRCOPY_FLUSH_ENABLE=1 | ||
|
|
||
| SCRIPT_PATH=$(readlink -f "$0") | ||
| SCRIPT_DIR=$(dirname "$SCRIPT_PATH") | ||
| export $(bash ${SCRIPT_DIR}/../../scripts/get_rdma_nics.sh gpu) | ||
| echo "KVCACHE_RDMA_NICS:${KVCACHE_RDMA_NICS}" | ||
| if [ -z "${KVCACHE_RDMA_NICS}" ]; then | ||
| echo "KVCACHE_RDMA_NICS is empty, please check the output of get_rdma_nics.sh" | ||
| exit 1 | ||
| fi | ||
|
|
||
| unset http_proxy && unset https_proxy | ||
| rm -rf log_* | ||
|
|
||
| # start redis | ||
| if ! redis-cli ping &>/dev/null; then | ||
| echo "Redis is not running. Starting redis-server..." | ||
| redis-server --daemonize yes | ||
| sleep 1 | ||
| else | ||
| echo "Redis is already running." | ||
| fi | ||
| sleep 1 | ||
|
|
||
| # start prefill | ||
| export CUDA_VISIBLE_DEVICES=0 | ||
| export FD_LOG_DIR="log_prefill" | ||
| mkdir -p ${FD_LOG_DIR} | ||
|
|
||
| nohup python -m fastdeploy.entrypoints.openai.api_server \ | ||
| --model ${MODEL_NAME} \ | ||
| --port 8100 \ | ||
| --metrics-port 8101 \ | ||
| --engine-worker-queue-port 8102 \ | ||
| --cache-queue-port 8103 \ | ||
| --max-model-len 32768 \ | ||
| --splitwise-role "prefill" \ | ||
| --cache-transfer-protocol "rdma,ipc" \ | ||
| --rdma-comm-ports 8104 \ | ||
| --pd-comm-port 8105 \ | ||
| --scheduler-name "splitwise" \ | ||
| --scheduler-host "127.0.0.1" \ | ||
| --scheduler-port 6379 \ | ||
| --scheduler-ttl 9000 \ | ||
| 2>&1 >${FD_LOG_DIR}/nohup & | ||
| wait_for_health 8100 | ||
|
|
||
| # start decode | ||
| export CUDA_VISIBLE_DEVICES=1 | ||
| export FD_LOG_DIR="log_decode" | ||
| mkdir -p ${FD_LOG_DIR} | ||
|
|
||
| nohup python -m fastdeploy.entrypoints.openai.api_server \ | ||
| --model ${MODEL_NAME} \ | ||
| --port 9000 \ | ||
| --metrics-port 9001 \ | ||
| --engine-worker-queue-port 9002 \ | ||
| --cache-queue-port 9003 \ | ||
| --max-model-len 32768 \ | ||
| --splitwise-role "decode" \ | ||
| --cache-transfer-protocol "rdma,ipc" \ | ||
| --rdma-comm-ports 9004 \ | ||
| --pd-comm-port 9005 \ | ||
| --scheduler-name "splitwise" \ | ||
| --scheduler-host "127.0.0.1" \ | ||
| --scheduler-port 6379 \ | ||
| --scheduler-ttl 9000 \ | ||
| 2>&1 >${FD_LOG_DIR}/nohup & | ||
| wait_for_health 9000 |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个开关应该已经废弃了,另外这个还开了DEBUG日志
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个开 debug 是便于调试,后面可以删掉。 ENABLE_V1_KVCACHE_SCHEDULER也还有效,后面要适配 v1