A benchmarking suite for LLM inference systems. Intelligence Per Watt sends workloads to your inference service and collects detailed telemetry—energy consumption, power usage, memory, temperature, and latency—to help you optimize performance and compare hardware configurations.
- Rust compiler (for building energy monitor)
- Protocol Buffer compiler (
protoc) - Ollama or vLLM (inference client)
git clone https://github.com/HazyResearch/intelligence-per-watt.git
# Create and activate virtual environment
uv venv
source .venv/bin/activate
# Build energy monitoring
uv run scripts/build_energy_monitor.py
# Install Intelligence Per Watt
uv pip install -e intelligence-per-wattOptional inference clients ship as extras—install each one you need from the package directory, e.g. uv pip install -e 'intelligence-per-watt[ollama]' or uv pip install -e 'intelligence-per-watt[vllm]'.
# 1. List available inference clients
ipw list clients
# 2, Run a benchmark
ipw profile \
--client ollama \
--model llama3.2:1b \
--client-base-url http://localhost:11434
# 3. Analyze the results
ipw analyze ./runs/profile_*
# 4. Generate plots
ipw plot ./runs/profile_*What gets measured: For each query, Intelligence Per Watt captures energy consumption, power draw, GPU/CPU memory usage, temperature, time-to-first-token, throughput, and token counts.
Send prompts to the device, profile hardware usage, and calculate IPW/IPJ.
ipw profile --client <client> --model <model> [options]Options:
--client- Inference client (e.g.,ollama,vllm)--model- Model name--client-base-url- Client base URL--eval-client- Judge client for scoring (default:openai)--eval-base-url- Judge service URL (default:https://api.openai.com/v1)--eval-model- Judge model (default:gpt-5-nano-2025-08-07)--max-queries- Limit queries for testing--dataset- Workload dataset (default:ipw)--output-dir- Where to save results
Example:
ipw profile \
--client ollama \
--model llama3.2:1b \
--client-base-url http://localhost:11434 \
--max-queries 100By default, ipw analyze calculates IPW and IPJ for a dataset. summary stats. To see energy, power, and latency vs. intput/output length, use --analysis regression.
ipw analyze <results_dir>
# or explicitly choose a different analysis
# ipw analyze <results_dir> --analysis regressionVisualize profiling data (scatter plots, regression lines, distributions).
ipw plot <results_dir> [--output <dir>]Discover available clients, datasets, and analysis types.
ipw list <clients|datasets|analyses|visualizations|all>Validate that your system can collect energy telemetry before running full workloads.
uv run scripts/test_energy_monitor.py [--interval 2.0]Profiling runs save to ./runs/profile_<hardware>_<model>/:
runs/profile_<hardware>_<model>/
├── data-*.arrow # Per-query metrics (HuggingFace dataset format)
├── summary.json # Run metadata and totals
├── analysis/ # Regression coefficients, statistics
└── plots/ # Graphs
