Skip to content

Conversation

@leo-pony
Copy link
Collaborator

@leo-pony leo-pony commented Nov 2, 2025

What this PR does / why we need it?

Protect the scene where the first problem occurs. The execution should be interrupted when the video memory application fails, rather than waiting until an illegal address is accessed.
Test result:

  1. Exception throws normally:
d0ce7a04-4102-46cc-a981-f44421f388e6

Corresponding code:
2c58c5fa-1882-435e-8d53-7c536846945b
dc9b4dc0-76ae-4a6b-90b3-4fca9ad63ec0

  1. HBM Memory malloc test case successfully:
9766fb23-1830-43c5-b29f-fb6973a78000
  1. extern launcher test with sleep and weakup:
d1f31a03-fe07-455a-bf64-7420937d62b3

Does this PR introduce any user-facing change?

NA

How was this patch tested?

NA

@github-actions
Copy link

github-actions bot commented Nov 2, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves error handling in the NPU memory allocator by throwing exceptions on allocation failures instead of just logging to stderr. This is a good change for robustness. However, I've found a few critical issues. There are a couple of string concatenations for error messages that will cause compilation errors. Additionally, and most critically, the newly thrown exceptions are not caught at the C++/Python API boundary, which will cause the Python interpreter to crash. These issues must be addressed.

…Malloc fails

Signed-off-by: leo-pony <nengjunma@outlook.com>
@leo-pony leo-pony added ready-for-test start test by label for PR ready read for review labels Nov 2, 2025
@leo-pony
Copy link
Collaborator Author

leo-pony commented Nov 3, 2025

After discussing with @weijinqian0 it was determined that this PR only strengthens the quality under conditions of memory OOM (Out of Memory) and does not affect the normal operation of the path. It is recommended to merge it into version v0.11.0-dev.
@weijinqian0 Could you help review this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant