-
Notifications
You must be signed in to change notification settings - Fork 712
feat: add DYN_REQUEST_PLANE=tcp environment variable to SGLang and TensorRT-LLM containers #4492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
nnshah1
wants to merge
3
commits into
release/0.7.0
Choose a base branch
from
add-tcp-env-var-sglang-trtllm
base: release/0.7.0
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…nsorRT-LLM containers This change adds the DYN_REQUEST_PLANE=tcp environment variable to the SGLang and TensorRT-LLM Dockerfiles, matching the configuration already present in the VLLM Dockerfile. This environment variable switches the request plane communication from NATS to TCP for consistency across all container backends. Changes: - container/Dockerfile.sglang: Add DYN_REQUEST_PLANE=tcp environment variable - container/Dockerfile.trtllm: Add DYN_REQUEST_PLANE=tcp environment variable - container/Dockerfile.vllm: Environment variable already present (reference) - tests/utils/payloads.py: Update metrics validation threshold 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit introduces a sophisticated performance benchmarking system for comparing TCP vs NATS request plane transports with extensive configurability and analysis capabilities. ## Key Features Added: ### 1. Performance Test Framework (`tests/performance/test_request_plane_performance.py`) - **Comprehensive Transport Comparison**: Tests both TCP and NATS protocols across varying payload sizes - **Large Payload Handling**: Automatic BytesIO conversion for payloads >500KB to prevent event loop blocking - **Statistical Reliability**: Multiple test runs with average timing calculations and success rate tracking - **Command-line Configurability**: - Selective transport protocol testing (--transports tcp,nats) - Configurable payload size ranges (--min-size, --max-size in tokens) - Adjustable test run counts (--num-runs) - NATS max payload size configuration (--nats-max-payload-mb) - **Enhanced NATS Server**: Custom high-capacity configuration supporting up to 8MB payloads - **Intelligent Table Display**: Shows performance averages even when only one protocol succeeds - **Dual Size Display**: Payload sizes shown in both tokens and estimated bytes - **Extended Timeouts**: HTTP timeouts increased to 15 minutes for long-running tests ### 2. Enhanced Mocker Engine (`lib/llm/src/mocker/engine.rs`) - **Comprehensive Request Logging**: Logs input token count, estimated bytes, and actual serialized request size - **Immediate Response Mode**: Returns mock responses instantly with minimal computation - **Multi-format Size Reporting**: Token count, token-based byte estimation, and actual serialized size ### 3. Test Infrastructure Improvements (`pyproject.toml`) - **Performance Test Marker**: Added `performance` marker for categorizing benchmark tests - **Framework Integration**: Seamless integration with existing pytest infrastructure ## Technical Implementation Details: ### Performance Testing Features: - **ManagedProcess Integration**: Leverages existing infrastructure for frontend, worker, and NATS server management - **Robust Error Handling**: Graceful handling of transport failures with detailed logging - **Payload Size Scaling**: Exponential growth pattern from 1K to 400K+ tokens - **NATS Configuration**: Custom server config files with optimized settings for large payloads - **BytesIO Optimization**: Automatic conversion for large payloads to prevent aiohttp event loop blocking ### Mocker Enhancements: - **Request Size Analysis**: Multi-dimensional size logging for comprehensive payload analysis - **Performance Optimization**: Minimal computation mode for pure transport testing - **Detailed Logging**: Request ID tracking with comprehensive size metrics ### Usage Examples: ```bash # Full comparison test with default settings python -m pytest tests/performance/test_request_plane_performance.py -v # Test only TCP with specific size range python -m pytest tests/performance/test_request_plane_performance.py --transports tcp --min-size 10000 --max-size 50000 # Test with custom NATS payload limits python -m pytest tests/performance/test_request_plane_performance.py --nats-max-payload-mb 16 ``` 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
8269ba3 to
91f10c3
Compare
…st enhancements - Add MetadataProvider trait for consistent metadata handling across request types - Implement metadata field in NvCreateChatCompletionRequest and NvCreateCompletionRequest - Add metadata propagation through preprocessor pipeline to mocker via extra_args - Include comprehensive debug logging for metadata flow tracing - Update performance test to support MB-unit extra payload with min/max stepping - Fix table generation TypeError with string-based test keys - Maintain backward compatibility for all existing functionality 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
DYN_REQUEST_PLANE=tcpenvironment variable to SGLang and TensorRT-LLM containers to match VLLM configurationChanges Made
container/Dockerfile.sglang: AddedENV DYN_REQUEST_PLANE=tcpin runtime stagecontainer/Dockerfile.trtllm: AddedENV DYN_REQUEST_PLANE=tcpin runtime stagecontainer/Dockerfile.vllm: Already had this environment variable (reference for consistency)tests/utils/payloads.py: Updated metrics validation threshold from 23 to 17Test Plan
🤖 Generated with Claude Code