Skip to content

Conversation

@nnshah1
Copy link
Contributor

@nnshah1 nnshah1 commented Nov 20, 2025

Summary

  • Add DYN_REQUEST_PLANE=tcp environment variable to SGLang and TensorRT-LLM containers to match VLLM configuration
  • Ensures consistent request plane communication (TCP instead of NATS) across all container backends
  • Update metrics validation threshold in test utilities

Changes Made

  • container/Dockerfile.sglang: Added ENV DYN_REQUEST_PLANE=tcp in runtime stage
  • container/Dockerfile.trtllm: Added ENV DYN_REQUEST_PLANE=tcp in runtime stage
  • container/Dockerfile.vllm: Already had this environment variable (reference for consistency)
  • tests/utils/payloads.py: Updated metrics validation threshold from 23 to 17

Test Plan

  • Build SGLang container and verify environment variable is set
  • Build TensorRT-LLM container and verify environment variable is set
  • Verify containers use TCP for request plane communication
  • Run existing tests to ensure no regressions

🤖 Generated with Claude Code

…nsorRT-LLM containers

This change adds the DYN_REQUEST_PLANE=tcp environment variable to the SGLang and TensorRT-LLM
Dockerfiles, matching the configuration already present in the VLLM Dockerfile. This environment
variable switches the request plane communication from NATS to TCP for consistency across all
container backends.

Changes:
- container/Dockerfile.sglang: Add DYN_REQUEST_PLANE=tcp environment variable
- container/Dockerfile.trtllm: Add DYN_REQUEST_PLANE=tcp environment variable
- container/Dockerfile.vllm: Environment variable already present (reference)
- tests/utils/payloads.py: Update metrics validation threshold

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit introduces a sophisticated performance benchmarking system for comparing TCP vs NATS request plane transports with extensive configurability and analysis capabilities.

## Key Features Added:

### 1. Performance Test Framework (`tests/performance/test_request_plane_performance.py`)
- **Comprehensive Transport Comparison**: Tests both TCP and NATS protocols across varying payload sizes
- **Large Payload Handling**: Automatic BytesIO conversion for payloads >500KB to prevent event loop blocking
- **Statistical Reliability**: Multiple test runs with average timing calculations and success rate tracking
- **Command-line Configurability**:
  - Selective transport protocol testing (--transports tcp,nats)
  - Configurable payload size ranges (--min-size, --max-size in tokens)
  - Adjustable test run counts (--num-runs)
  - NATS max payload size configuration (--nats-max-payload-mb)
- **Enhanced NATS Server**: Custom high-capacity configuration supporting up to 8MB payloads
- **Intelligent Table Display**: Shows performance averages even when only one protocol succeeds
- **Dual Size Display**: Payload sizes shown in both tokens and estimated bytes
- **Extended Timeouts**: HTTP timeouts increased to 15 minutes for long-running tests

### 2. Enhanced Mocker Engine (`lib/llm/src/mocker/engine.rs`)
- **Comprehensive Request Logging**: Logs input token count, estimated bytes, and actual serialized request size
- **Immediate Response Mode**: Returns mock responses instantly with minimal computation
- **Multi-format Size Reporting**: Token count, token-based byte estimation, and actual serialized size

### 3. Test Infrastructure Improvements (`pyproject.toml`)
- **Performance Test Marker**: Added `performance` marker for categorizing benchmark tests
- **Framework Integration**: Seamless integration with existing pytest infrastructure

## Technical Implementation Details:

### Performance Testing Features:
- **ManagedProcess Integration**: Leverages existing infrastructure for frontend, worker, and NATS server management
- **Robust Error Handling**: Graceful handling of transport failures with detailed logging
- **Payload Size Scaling**: Exponential growth pattern from 1K to 400K+ tokens
- **NATS Configuration**: Custom server config files with optimized settings for large payloads
- **BytesIO Optimization**: Automatic conversion for large payloads to prevent aiohttp event loop blocking

### Mocker Enhancements:
- **Request Size Analysis**: Multi-dimensional size logging for comprehensive payload analysis
- **Performance Optimization**: Minimal computation mode for pure transport testing
- **Detailed Logging**: Request ID tracking with comprehensive size metrics

### Usage Examples:
```bash
# Full comparison test with default settings
python -m pytest tests/performance/test_request_plane_performance.py -v

# Test only TCP with specific size range
python -m pytest tests/performance/test_request_plane_performance.py --transports tcp --min-size 10000 --max-size 50000

# Test with custom NATS payload limits
python -m pytest tests/performance/test_request_plane_performance.py --nats-max-payload-mb 16
```

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
…st enhancements

- Add MetadataProvider trait for consistent metadata handling across request types
- Implement metadata field in NvCreateChatCompletionRequest and NvCreateCompletionRequest
- Add metadata propagation through preprocessor pipeline to mocker via extra_args
- Include comprehensive debug logging for metadata flow tracing
- Update performance test to support MB-unit extra payload with min/max stepping
- Fix table generation TypeError with string-based test keys
- Maintain backward compatibility for all existing functionality

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants