A high-performance Python SDK for running Lattice QCD simulations on AWS Trainium and Inferentia instances.
This SDK provides optimized tensor operations and algorithms specifically designed for Lattice QCD computations on AWS Neuron-powered EC2 instances. It leverages PyTorch/XLA and the AWS Neuron SDK to deliver superior performance compared to traditional GPU-based solutions.
- 🚀 High Performance: Optimized for AWS Trainium/Inferentia hardware
- 🔢 Core Components: Lattice geometry, gauge fields, fermion fields
- ⚡ Efficient Operators: Wilson and Staggered fermion implementations
- 🔧 Linear Solvers: Conjugate Gradient and BiCGStab iterative solvers
- 📊 Benchmarking: Built-in performance comparison tools
- 🔄 Fallback Support: CPU/NumPy backend for development
- 🧪 Well Tested: Comprehensive test suite
-
Clone the repository:
git clone https://github.com/JGalego/lqcd-neuron-sdk cd lqcd-neuron-sdk -
Install dependencies:
pip install -e . -
For AWS Neuron support (on Trainium/Inferentia instances):
# Install AWS Neuron SDK pip install torch-neuronx neuronx-cc --extra-index-url https://pip.repos.neuron.amazonaws.com
from lqcd_neuron.core import Lattice, GaugeField
from lqcd_neuron.operators import WilsonOperator
# Create a 4D lattice
lattice = Lattice((8, 8, 8, 8), device="xla") # Use "cpu" for development
# Initialize gauge field
gauge_field = GaugeField(lattice)
# Compute average plaquette
avg_plaq = gauge_field.average_plaquette()
print(f"Average plaquette: {avg_plaq}")
# Create Wilson fermion operator
wilson_op = WilsonOperator(lattice, gauge_field, mass=0.1)from lqcd_neuron.core import Lattice
# Create 4D spacetime lattice
lattice = Lattice(dimensions=(Nt, Nx, Ny, Nz), device="xla")from lqcd_neuron.core import GaugeField
gauge = GaugeField(lattice)
plaquette = gauge.average_plaquette()
action = gauge.wilson_action(beta=6.0)from lqcd_neuron.core import FermionField
fermion = FermionField(lattice)
fermion.random_initialize()
norm = fermion.norm()from lqcd_neuron.operators import WilsonOperator, StaggeredOperator
# Wilson fermions
wilson_op = WilsonOperator(lattice, gauge_field, mass=0.1)
result = wilson_op.apply(fermion_field)
# Staggered fermions
staggered_op = StaggeredOperator(lattice, gauge_field, mass=0.05)from lqcd_neuron.solvers import ConjugateGradient, BiCGStab
# CG solver for D†D * x = b
def ddag_d_operator(field):
return wilson_op.dagger(wilson_op.apply(field))
cg = ConjugateGradient(ddag_d_operator, tolerance=1e-12)
solution = cg.solve(rhs_field)The SDK includes comprehensive benchmarking tools:
from lqcd_neuron.benchmarks import PerformanceBenchmark
# Benchmark on different devices
benchmark = PerformanceBenchmark(device="xla") # or "cpu", "cuda"
# Run comprehensive benchmarks
results = benchmark.run_comprehensive_benchmark([
(8, 8, 8, 8),
(16, 16, 16, 16),
(32, 32, 32, 32)
])
# Compare devices
comparison = benchmark.compare_devices(["cpu", "xla", "cuda"])
benchmark.print_summary()To use AWS Trainium/Inferentia devices, specify device="xla" when creating lattices:
# Create lattice on Neuron device
lattice = Lattice((8, 8, 8, 8), device="xla")
gauge_field = GaugeField(lattice)
# XLA compilation happens on first operation (slower)
result1 = wilson_op.apply(fermion_field) # Compilation + execution
# Subsequent operations use compiled code (faster)
result2 = wilson_op.apply(fermion_field) # Fast execution- Warm-up runs: First operations trigger XLA compilation
- Batch operations: Process multiple configurations together
- Larger lattices: Better utilization on bigger problems (16⁴+)
- Repeated operations: Amortize compilation cost over many runs
# Check if running on Neuron-capable instance
import subprocess
try:
result = subprocess.run(['neuron-ls'], capture_output=True)
if result.returncode == 0:
print("Neuron devices available!")
# Use device="xla"
else:
print("No Neuron devices, using CPU fallback")
# Use device="cpu"
except FileNotFoundError:
print("Neuron SDK not installed")See the examples/ directory for complete working examples:
basic_plaquette.py- Computing gauge observables (CPU)wilson_fermion_demo.py- Fermion operators and linear solvers (CPU)benchmark_demo.py- Performance benchmarking (multi-device)simple_neuron_xla.py- Basic XLA operations on Neuron devicesneuron_device_demo.py- Full Neuron device demo with performance comparison
Run examples:
# CPU-based examples (work on any system)
python examples/basic_plaquette.py
python examples/wilson_fermion_demo.py
python examples/benchmark_demo.py
# Neuron-specific examples (require Trainium/Inferentia instances)
python examples/simple_neuron_xla.py
python examples/neuron_device_demo.pyRun the test suite:
# Simple tests (no external dependencies)
python tests/simple_test.py
# Full test suite (requires pytest)
pytest tests/ -v-
Launch EC2 instance:
- Use
trn1(Trainium) orinf2(Inferentia) instance types - Recommended:
trn1.2xlargeor larger
- Use
-
Install Neuron SDK:
# Configure Neuron repository . /etc/os-release sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null <<EOF deb https://apt.repos.neuron.amazonaws.com ${VERSION_CODENAME} main EOF # Install Neuron packages sudo apt-get update -y sudo apt-get install aws-neuronx-dkms aws-neuronx-collectives aws-neuronx-runtime-lib aws-neuronx-tools -y # Install PyTorch Neuron pip install torch-neuronx neuronx-cc --extra-index-url https://pip.repos.neuron.amazonaws.com
-
Verify installation:
neuron-ls # Should show available Neuron devices
- Batch operations for maximum throughput
- Use XLA compilation for optimal performance
- Profile with Neuron tools for bottleneck identification
- Compare against GPU baselines using benchmarking suite
lqcd-neuron-sdk/
├── src/lqcd_neuron/ # Main package
│ ├── core/ # Core data structures
│ │ ├── lattice.py # 4D lattice geometry
│ │ ├── neuron_tensor.py # Optimized tensor operations
│ │ ├── gauge_field.py # SU(3) gauge fields
│ │ └── fermion_field.py # Fermion fields with spinors
│ ├── operators/ # QCD operators
│ │ ├── wilson.py # Wilson fermion operator
│ │ └── staggered.py # Staggered fermion operator
│ ├── solvers/ # Linear algebra solvers
│ │ ├── cg.py # Conjugate Gradient
│ │ └── bicgstab.py # BiCGStab
│ └── benchmarks/ # Performance tools
│ └── performance.py # Benchmarking suite
├── tests/ # Test suite
├── examples/ # Usage examples
└── docs/ # Documentation
- Python 3.8+
- NumPy (always required)
- PyTorch + torch-neuronx (for Neuron support)
- pytest (for testing)
# Development install
pip install -e ".[dev]"
# Run linting
black src/ tests/
flake8 src/
# Type checking
mypy src/- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
MIT License - see LICENSE file for details.
If you use this SDK in your research, please cite:
@software{lqcd_neuron_sdk,
title={LQCD Neuron SDK: High-Performance Lattice QCD on AWS Trainium and Inferentia},
author={João Galego},
year={2025},
url={https://github.com/JGalego/lqcd-neuron-sdk}
}- (Creutz, 1983) Quarks, Gluons and Lattices
- (Creutz, 2004) Simulating Quarks
- (Egri et al., 2007) Lattice QCD as a video game
- (Davies, 2005) Lattice QCD - a guide for people who want results
- (Lepage, 2004) Lattice QCD for Novices
Questions? Open an issue or contact the development team.