-
Notifications
You must be signed in to change notification settings - Fork 629
[docs] [P/D] add feature guide for disaggregated-prefill #3950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docs] [P/D] add feature guide for disaggregated-prefill #3950
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a comprehensive design document for the disaggregated-prefill feature. The document is well-structured, covering the motivation, usage, implementation details, DFX analysis, and known limitations. It serves as a valuable resource for understanding the architecture and functionality of this feature. The changes are clear and I have no high or critical severity concerns to report.
Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com>
7a31b3c to
18c77c5
Compare
Signed-off-by: liziyu <liziyu16@huawei.com>
|
|
||
| Under the disaggregated-prefill, a global proxy receives external requests, forwarding prefill to P nodes and decode to D nodes; the KV cache (key–value cache) is exchanged between P and D nodes via peer-to-peer (P2P) communication. | ||
|
|
||
| ### 2. Implementation Design |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need framework picture here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need framework picture here.
OK, we add framework picture for mooncake connector and mooncake layerwise connector
f1b5148 to
b625801
Compare
Signed-off-by: liziyu <liziyu16@huawei.com>
c58ad0c to
35bf4fb
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
35bf4fb to
a141605
Compare
a141605 to
dc95e58
Compare
Signed-off-by: wangxiaoteng888 <56506195+wangxiaoteng888@users.noreply.github.com>
…t#3950) ### What this PR does / why we need it? add feature guide for disaggregated-prefill ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? by ci - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@83f478b --------- Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Signed-off-by: liziyu <liziyu16@huawei.com> Signed-off-by: wangxiaoteng888 <56506195+wangxiaoteng888@users.noreply.github.com> Co-authored-by: liziyu <liziyu16@huawei.com> Signed-off-by: luolun <luolun1995@cmbchina.com>
…t#3950) ### What this PR does / why we need it? add feature guide for disaggregated-prefill ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? by ci - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@83f478b --------- Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Signed-off-by: liziyu <liziyu16@huawei.com> Signed-off-by: wangxiaoteng888 <56506195+wangxiaoteng888@users.noreply.github.com> Co-authored-by: liziyu <liziyu16@huawei.com> Signed-off-by: hwhaokun <haokun0405@163.com>
…t#3950) ### What this PR does / why we need it? add feature guide for disaggregated-prefill ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? by ci - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@83f478b --------- Signed-off-by: wangxiaoteng <wangxiaoteng@huawei.com> Signed-off-by: liziyu <liziyu16@huawei.com> Signed-off-by: wangxiaoteng888 <56506195+wangxiaoteng888@users.noreply.github.com> Co-authored-by: liziyu <liziyu16@huawei.com> Signed-off-by: nsdie <yeyifan@huawei.com>
What this PR does / why we need it?
add feature guide for disaggregated-prefill
Does this PR introduce any user-facing change?
no
How was this patch tested?
by ci