Skip to content

Conversation

@zzzzwwjj
Copy link
Collaborator

@zzzzwwjj zzzzwwjj commented Nov 3, 2025

What this PR does / why we need it?

Add aclgraph developer guide.

Does this PR introduce any user-facing change?

How was this patch tested?

@github-actions
Copy link

github-actions bot commented Nov 3, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Nov 3, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds comprehensive documentation for the ACLGraph feature. A new file, docs/source/developer_guide/feature_guide/ACLGraph.md, is introduced, covering the motivation, usage, implementation details, and limitations of ACLGraph. The feature guide index is also updated accordingly. The documentation is a valuable addition for developers working with this feature. While the content is informative, the document would benefit from a proofreading pass to address several grammatical errors and improve overall clarity.

@zzzzwwjj zzzzwwjj changed the title [0.11.0][doc] add aclgraph doc [0.11.0][doc] add aclgraph developer guide Nov 3, 2025

## How to use ACLGraph?

ACLGraph is enabled by default in V1 Engine, just set to use V1 Engine is enough.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users need to check that enforce-eager is not set to True.


## How it works?

In short, graph mode works in two steps: **capture and replay**. When engine starts, we will capture all of the ops in model forward and save it as a graph, and when req come in, we just replay the graph on gpus, and waiting for result.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we just replay the graph on gpus, and
maybe npus? I'm not quite sure.


### Padding and Bucketing

Due to graph can only replay the ops captured before, without doing tiling and checking graph input, so we need to ensure the consistency of the graph input, but we know that model input's shape depends on the request scheduled by Scheduler, we can't ensure the consistency.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to graph can only replay the ops captured before, without doing tiling and checking graph input, so we need to ensure the consistency of the graph input

grammar issue here, "due to" and "so" should not appear together.


Due to graph can only replay the ops captured before, without doing tiling and checking graph input, so we need to ensure the consistency of the graph input, but we know that model input's shape depends on the request scheduled by Scheduler, we can't ensure the consistency.

Obviously, we can solve this problem by capturing the biggest shape and padding all of the model input to it. But it will bring a lot of redundant computing and make performance worse. So we can capture multiple graphs with different shape, and pad the model input to the nearest graph, it will greatly reduce redundant computing, but when `max_num_batched_tokens` is very large, the number of graphs that need to be captured will also become very large. But we know that when intensor's shape is large, the computing time will be very long, and graph mode is not necessary in this case. So all of things we need to do is:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use periods to break it into several sentences.

it will greatly reduce redundant computing

which will ....

@zzzzwwjj zzzzwwjj force-pushed the aclgraph_doc_0.11.0 branch 3 times, most recently from 0c0bde9 to 2aaa7ee Compare November 3, 2025 03:02
Signed-off-by: zzzzwwjj <1183291235@qq.com>
@zzzzwwjj zzzzwwjj force-pushed the aclgraph_doc_0.11.0 branch from 2aaa7ee to 55bf1a9 Compare November 3, 2025 13:42
@wangxiyuan wangxiyuan merged commit 3db53d1 into vllm-project:v0.11.0-dev Nov 6, 2025
12 checks passed
@zzzzwwjj zzzzwwjj deleted the aclgraph_doc_0.11.0 branch November 26, 2025 07:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants