A comprehensive benchmarking suite for comparing PyTorch execution modes: eager execution, TorchScript JIT, and torch.compile.
- Multiple Execution Modes: Compare eager, TorchScript, and
torch.compileperformance - CPU vs GPU: Benchmark on both CPU and GPU devices
- Three Model Architectures: CNN, MLP, and Transformer models included
- Profiling & Tracing: Built-in profiling utilities for performance analysis
- Graph Analysis: Compare computation graphs between execution modes
- Visualizations: Automatic generation of performance charts and time series plots
- CLI Interface: Easy-to-use command-line tool for running benchmarks
# Clone the repository
git clone git@github.com:iotaaxel/torchscript-performance-bench.git
cd torchscript-performance-bench
# Install dependencies
pip install -r requirements.txt# Benchmark all models
python scripts/run_benchmarks.py --all
# Benchmark a specific model
python scripts/run_benchmarks.py --model cnn
# Compare CPU vs GPU
python scripts/run_benchmarks.py --model transformer --compare-cpu-gpu
# Custom benchmark settings
python scripts/run_benchmarks.py --model mlp --num-runs 200 --warmup-runs 20python scripts/run_benchmarks.py [OPTIONS]
Options:
--model {cnn,mlp,transformer} Model type to benchmark
--all Run benchmarks for all models
--device {cpu,cuda} Device to run on (default: auto-detect)
--num-runs INT Number of benchmark iterations (default: 100)
--warmup-runs INT Number of warmup iterations (default: 10)
--compare-cpu-gpu Compare CPU and GPU performance
--no-plots Skip generating plots
--no-save Skip saving results to JSONfrom models import create_cnn_model
from bench import BenchmarkRunner, ExecutionMode
# Create model
model = create_cnn_model()
# Create input function
def input_fn():
return torch.randn(1, 3, 32, 32)
# Run benchmark
runner = BenchmarkRunner(warmup_runs=10, num_runs=100, device='cuda')
results = runner.benchmark(
model,
input_fn,
modes=[ExecutionMode.EAGER, ExecutionMode.TORCHSCRIPT, ExecutionMode.COMPILE]
)
# Access results
for mode, result in results.items():
print(f"{mode}: {result.mean_time_ms:.3f} ± {result.std_time_ms:.3f} ms")torchscript-performance-bench/
├── models/ # Model definitions
│ ├── cnn.py # Small CNN model
│ ├── mlp.py # Multi-layer perceptron
│ └── transformer.py # Tiny transformer block
├── bench/ # Benchmarking infrastructure
│ ├── benchmark.py # Core benchmark runner
│ └── profiler.py # Profiling utilities
├── scripts/ # Scripts and utilities
│ ├── run_benchmarks.py # CLI runner
│ └── visualize.py # Visualization tools
├── tests/ # Unit tests
├── reports/ # Generated reports and plots
└── requirements.txt # Python dependencies
A lightweight convolutional neural network with:
- 3 convolutional layers with batch normalization
- Adaptive average pooling
- Fully connected layers with dropout
A multi-layer perceptron with:
- Configurable hidden layer sizes
- Batch normalization and dropout
- Automatic input flattening
A minimal transformer block with:
- Multi-head self-attention
- Position-wise feed-forward network
- Layer normalization and residual connections
Results are automatically saved to:
- JSON:
reports/{model_name}_results.json- Detailed numerical results - Plots:
reports/{model_name}_{device}_comparison.png- Bar chart comparisonreports/{model_name}_{device}_timeseries.png- Time series plotreports/{model_name}_{device}_speedup.png- Speedup relative to eagerreports/{model_name}_cpu_vs_gpu.png- CPU vs GPU comparison
Use the profiler to analyze model execution:
from bench.profiler import Profiler
profiler = Profiler(device='cuda')
profile_data = profiler.profile_model(model, input_tensor, execution_mode='torchscript')
# Get graph representation
graph_str = profiler.get_graph_representation(model, input_tensor)
# Compare graphs
comparison = profiler.compare_graphs(graph1, graph2)Run the test suite:
pytest tests/- Python 3.8+
- PyTorch 2.0+
- matplotlib 3.7+
- numpy 1.24+
- tqdm 4.65+
MIT License
Contributions are welcome! Please feel free to submit a Pull Request.