feat: add DYN_REQUEST_PLANE=tcp environment variable to SGLang and TensorRT-LLM containers #4492

nnshah1 · 2025-11-20T01:42:20Z

Summary

Add DYN_REQUEST_PLANE=tcp environment variable to SGLang and TensorRT-LLM containers to match VLLM configuration
Ensures consistent request plane communication (TCP instead of NATS) across all container backends
Update metrics validation threshold in test utilities

Changes Made

container/Dockerfile.sglang: Added ENV DYN_REQUEST_PLANE=tcp in runtime stage
container/Dockerfile.trtllm: Added ENV DYN_REQUEST_PLANE=tcp in runtime stage
container/Dockerfile.vllm: Already had this environment variable (reference for consistency)
tests/utils/payloads.py: Updated metrics validation threshold from 23 to 17

Test Plan

Build SGLang container and verify environment variable is set
Build TensorRT-LLM container and verify environment variable is set
Verify containers use TCP for request plane communication
Run existing tests to ensure no regressions

🤖 Generated with Claude Code

…nsorRT-LLM containers This change adds the DYN_REQUEST_PLANE=tcp environment variable to the SGLang and TensorRT-LLM Dockerfiles, matching the configuration already present in the VLLM Dockerfile. This environment variable switches the request plane communication from NATS to TCP for consistency across all container backends. Changes: - container/Dockerfile.sglang: Add DYN_REQUEST_PLANE=tcp environment variable - container/Dockerfile.trtllm: Add DYN_REQUEST_PLANE=tcp environment variable - container/Dockerfile.vllm: Environment variable already present (reference) - tests/utils/payloads.py: Update metrics validation threshold 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

This commit introduces a sophisticated performance benchmarking system for comparing TCP vs NATS request plane transports with extensive configurability and analysis capabilities. ## Key Features Added: ### 1. Performance Test Framework (`tests/performance/test_request_plane_performance.py`) - **Comprehensive Transport Comparison**: Tests both TCP and NATS protocols across varying payload sizes - **Large Payload Handling**: Automatic BytesIO conversion for payloads >500KB to prevent event loop blocking - **Statistical Reliability**: Multiple test runs with average timing calculations and success rate tracking - **Command-line Configurability**: - Selective transport protocol testing (--transports tcp,nats) - Configurable payload size ranges (--min-size, --max-size in tokens) - Adjustable test run counts (--num-runs) - NATS max payload size configuration (--nats-max-payload-mb) - **Enhanced NATS Server**: Custom high-capacity configuration supporting up to 8MB payloads - **Intelligent Table Display**: Shows performance averages even when only one protocol succeeds - **Dual Size Display**: Payload sizes shown in both tokens and estimated bytes - **Extended Timeouts**: HTTP timeouts increased to 15 minutes for long-running tests ### 2. Enhanced Mocker Engine (`lib/llm/src/mocker/engine.rs`) - **Comprehensive Request Logging**: Logs input token count, estimated bytes, and actual serialized request size - **Immediate Response Mode**: Returns mock responses instantly with minimal computation - **Multi-format Size Reporting**: Token count, token-based byte estimation, and actual serialized size ### 3. Test Infrastructure Improvements (`pyproject.toml`) - **Performance Test Marker**: Added `performance` marker for categorizing benchmark tests - **Framework Integration**: Seamless integration with existing pytest infrastructure ## Technical Implementation Details: ### Performance Testing Features: - **ManagedProcess Integration**: Leverages existing infrastructure for frontend, worker, and NATS server management - **Robust Error Handling**: Graceful handling of transport failures with detailed logging - **Payload Size Scaling**: Exponential growth pattern from 1K to 400K+ tokens - **NATS Configuration**: Custom server config files with optimized settings for large payloads - **BytesIO Optimization**: Automatic conversion for large payloads to prevent aiohttp event loop blocking ### Mocker Enhancements: - **Request Size Analysis**: Multi-dimensional size logging for comprehensive payload analysis - **Performance Optimization**: Minimal computation mode for pure transport testing - **Detailed Logging**: Request ID tracking with comprehensive size metrics ### Usage Examples: ```bash # Full comparison test with default settings python -m pytest tests/performance/test_request_plane_performance.py -v # Test only TCP with specific size range python -m pytest tests/performance/test_request_plane_performance.py --transports tcp --min-size 10000 --max-size 50000 # Test with custom NATS payload limits python -m pytest tests/performance/test_request_plane_performance.py --nats-max-payload-mb 16 ``` 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

…st enhancements - Add MetadataProvider trait for consistent metadata handling across request types - Implement metadata field in NvCreateChatCompletionRequest and NvCreateCompletionRequest - Add metadata propagation through preprocessor pipeline to mocker via extra_args - Include comprehensive debug logging for metadata flow tracing - Update performance test to support MB-unit extra payload with min/max stepping - Fix table generation TypeError with string-based test keys - Maintain backward compatibility for all existing functionality 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

nnshah1 requested review from a team as code owners November 20, 2025 01:42

github-actions bot added the feat label Nov 20, 2025

nnshah1 changed the base branch from main to release/0.7.0 November 20, 2025 01:45

pull-request-size bot added the size/XS label Nov 20, 2025

biswapanda mentioned this pull request Nov 20, 2025

feat: add DYN_REQUEST_PLANE=tcp environment variable to SGLang and TensorRT-LLM containers -2 #4498

Open

copy-pr-bot bot temporarily deployed to GITLAB November 20, 2025 23:09 Inactive

pull-request-size bot added size/XXL and removed size/XS labels Nov 20, 2025

copy-pr-bot bot temporarily deployed to GITLAB November 20, 2025 23:13 Inactive

nnshah1 force-pushed the add-tcp-env-var-sglang-trtllm branch from 8269ba3 to 91f10c3 Compare November 25, 2025 00:20

copy-pr-bot bot temporarily deployed to GITLAB November 25, 2025 00:20 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 25, 2025 00:21 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 25, 2025 02:06 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 25, 2025 02:08 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add DYN_REQUEST_PLANE=tcp environment variable to SGLang and TensorRT-LLM containers #4492

feat: add DYN_REQUEST_PLANE=tcp environment variable to SGLang and TensorRT-LLM containers #4492

Uh oh!

nnshah1 commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: add DYN_REQUEST_PLANE=tcp environment variable to SGLang and TensorRT-LLM containers #4492

Are you sure you want to change the base?

feat: add DYN_REQUEST_PLANE=tcp environment variable to SGLang and TensorRT-LLM containers #4492

Uh oh!

Conversation

nnshah1 commented Nov 20, 2025

Summary

Changes Made

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants