Author: Leonardo Capossio — bard0 design hello@bard0.com
Open source AXI4 / AXI4-Lite interconnect generator. Describe your bus topology in YAML, get Verilog. Or use a pre-built output directly.
MIT licensed. Built with SpinalHDL.
Hardware-validated on Xilinx Arty A7-100T. 81 SpinalSim + 24 cocotb tests pass.
- What it does
- Comparison
- Quick start
- YAML configuration reference
- Simulation
- Hardware validation — Arty A7-100T
- Port naming
- Tool integration
- Project structure
- License
axiZero generates a non-blocking AXI interconnect that routes M masters to N slaves based on a static address map. Each port can be independently configured as AXI4 or AXI4-Lite; the required adapters are inserted automatically.
Implemented and working:
- AXI4 full (with IDs, bursts, outstanding transactions)
- AXI4-Lite (no IDs, single-beat)
- Per-port mixed AXI4 / AXI4-Lite with automatic adapter insertion
- AXI4-Lite data-width conversion (zero-extend / truncate at port boundaries)
- Full AXI4 data-width conversion — burst-splitting upsizer and downsizer at port boundaries; all three burst types (FIXED, INCR, WRAP) supported
- Register slices, per master and per slave port
- Round-robin, fixed-priority, and weighted round-robin arbitration
- QoS arbitration (highest AXQOS wins) with aging-based anti-starvation
- Pipelined mode (
max_outstanding > 1) with per-slave W-route FIFOs and ID-based response routing - IPIF compatibility — AW and W are presented simultaneously to slaves that require it
- YAML → Verilog generator with port-name post-processing for Vivado AXI naming conventions
- AXI3-to-AXI4 bridge adapter with WID reorder buffer (write interleaving → strict AW-order), locked access conversion, LEN/LOCK field adaptation
Not yet implemented:
- Clock domain crossing: all ports share a single clock (
aclk) and reset (aresetn). - AXI4-Lite crossbar pipelined mode: the Lite-only path is always single-outstanding per slave.
| axiZero | PULP axi | verilog-axi | taxi | dpretet/axi-crossbar | |
|---|---|---|---|---|---|
| License | MIT | SHL-0.51 | MIT | CERN-OHL-S¹ | MIT |
| AXI4 full | ✓ | ✓ | ✓ | ✓ | ✓ |
| AXI4-Lite | ✓ | ✓ | ✓ | ✓ | ✓ |
| Per-port mixed AXI4/Lite | ✓ | — | — | — | — |
| AXI4-Lite data-width conversion | ✓ | ✓ | ✓ | ✓ | — |
| Full AXI4 data-width conversion | ✓ | ✓ | ✓ | ✓ | — |
| Register slices | ✓ | ✓ | ✓ | ✓ | — |
| Round-robin / fixed-priority | ✓ | ✓ | ✓ | ✓ | ✓ |
| Weighted round-robin | ✓ | ✓ | — | — | ✓ |
| QoS arbitration | ✓ | ✓ | — | — | ✓ |
¹ CERN-OHL-S is copyleft (share-alike); requires releasing your full digital design on request.
Requirements: Python 3.8+ with PyYAML, Java 11+ (tested with Java 21), sbt.
On Linux or WSL, Verilator 5.x is also required (it is invoked internally during generation to validate the output).
An install script handles Java, sbt, Verilator, and Python packages. It detects whether you are on Linux, WSL, macOS, or Windows and runs the appropriate package manager commands.
# Check what is / isn't installed
python scripts/install_deps.py --check
# Install everything
python scripts/install_deps.pyOn Windows the script uses winget. On macOS it uses Homebrew (must be installed first). On Linux/WSL it uses apt. Note that Verilator and cocotb simulation require Linux or WSL — on Windows, install WSL Ubuntu 24.04 and run the script from inside it.
Java 21 (required by sbt/SpinalHDL):
# Ubuntu / Debian / WSL
sudo apt-get install -y temurin-21-jdk # via adoptium.net apt repo, or:
sudo apt-get install -y openjdk-21-jdk # standard OpenJDK
# macOS (Homebrew)
brew install --cask temurin@21
# Windows: download from https://adoptium.net/sbt (Scala build tool):
# Ubuntu / Debian / WSL — one-liner from sbt docs
echo "deb https://repo.scala-sbt.org/scalasbt/debian all main" \
| sudo tee /etc/apt/sources.list.d/sbt.list
curl -sL "https://keyserver.ubuntu.com/pks/lookup?op=get&search=0x2EE0EA64E40A89B84B2DF73499E82A75642AC823" \
| sudo apt-key add -
sudo apt-get update && sudo apt-get install -y sbt
# macOS
brew install sbt
# Windows: download the MSI from https://www.scala-sbt.org/download/Verilator 5.x (Linux / WSL only; used to validate generated output):
sudo apt-get install -y verilator # Ubuntu 24.04 ships Verilator 5.xpip install pyyaml
# Print a working example config
python scripts/axizero.py example > my_design.yaml
# Generate Verilog
python scripts/axizero.py generate my_design.yaml --output rtl/Eleven configurations are pre-generated in generated/. Copy the appropriate file into your project and instantiate it.
Resource usage is post-synthesis (Vivado 2025.2, xc7a100t, OOC mode). No BRAM or DSP used by any configuration.
| File | Description | LUTs | FFs |
|---|---|---|---|
MyLite_1M4S.v |
1M×4S AXI4-Lite, round-robin | 237 | 8 |
AxiZeroLite_1M4S.v |
1M×4S AXI4-Lite, round-robin (wider addr) | 245 | 8 |
MyLite_2M2S_WRR.v |
2M×2S AXI4-Lite, weighted round-robin (3:1) | 352 | 286 |
MyLite_2M4S_FP.v |
2M×4S AXI4-Lite, fixed priority | 527 | 16 |
AxiZeroLite_2M4S_RS.v |
2M×4S AXI4-Lite, register slices on all ports | 563 | 784 |
AxiZeroLite_4M4S_FP.v |
4M×4S AXI4-Lite, fixed priority | 1047 | 24 |
MyFull_2M2S.v |
2M×2S AXI4 Full, 64-bit, round-robin | 379 | 4 |
MyFull_2M2S_QoS.v |
2M×2S AXI4 Full, 64-bit, QoS arbitration | 626 | 62 |
MyMixed_2M3S.v |
2M×3S mixed (Full + Lite), auto adapters | 421 | 34 |
ArtyDC_1M3S.v |
1M×3S mixed, Arty A7 don't-care default config | 258 | 8 |
ArtyDC_2M4S.v |
2M×4S mixed, Arty A7 don't-care default config | 591 | 28 |
If none of these match your topology, generate a custom one with Option A.
The configuration file contains a designs list. Each entry generates one Verilog file.
designs:
- name: MySoC
arbitration: round_robin
max_outstanding: 4
fabric_data_width: 64
weights: [3, 1]
masters:
- type: full
addr_width: 32
data_width: 64
id_width: 4
reg_slice: false
slaves:
- base: 0x0000_0000
size: 0x8000_0000
type: full
data_width: 64
reg_slice: false| Key | Type | Default | Description |
|---|---|---|---|
name |
string | required | Output filename (without .v). Must be a valid Verilog module name. |
type |
string | auto | Force lite (all-Lite crossbar) or full (Full AXI4 crossbar). If omitted, inferred from port types: all-Lite ports use the lightweight Lite crossbar; any Full port uses the Full crossbar with automatic adapters. |
arbitration |
string | round_robin |
Arbitration policy when multiple masters contend for the same slave. See Arbitration modes. |
weights |
list[int] | — | One integer per master. Only used with weighted_round_robin. Master i receives weights[i] grants per round. |
max_outstanding |
int | 1 |
Maximum outstanding transactions per slave per direction. See Pipelined vs blocking mode. |
fabric_data_width |
int | max of all ports | Override the internal fabric data width. Width converters are inserted automatically at any port whose data_width differs. See Data-width conversion. |
Each entry in the masters list defines one slave-facing AXI interface on the crossbar (where you connect your CPU, DMA, etc.).
| Key | Type | Default | Description |
|---|---|---|---|
type |
string | full |
full (AXI4 with IDs and bursts) or lite (AXI4-Lite, single-beat, no IDs). A Lite master connecting to a Full crossbar gets an automatic Lite-to-Full adapter. |
addr_width |
int | required | Address bus width in bits (typically 32). |
data_width |
int | required | Data bus width in bits (32, 64, 128, …). If it differs from fabric_data_width, a width converter is inserted. |
id_width |
int | 4 |
Transaction ID width. Full AXI4 only; ignored for Lite. The crossbar appends ceil(log2(nMasters)) master-index bits internally, so slave-side ID width = id_width + masterIndexBits. |
reg_slice |
bool | false |
Insert a register slice (pipeline stage) on this master port for timing closure. |
Each entry in the slaves list defines one master-facing AXI interface on the crossbar (where you connect your BRAM, peripheral, etc.).
| Key | Type | Default | Description |
|---|---|---|---|
base |
int | required | Base address. Hex (0xC000_0000) or decimal. Underscores are allowed for readability. |
size |
int | required | Address region size in bytes. Must be a power of 2. The slave occupies [base, base+size). |
type |
string | full |
full or lite. A Lite slave on a Full crossbar gets an automatic Full-to-Lite adapter. |
data_width |
int | required | Data bus width in bits. If it differs from fabric_data_width, a width converter is inserted. |
reg_slice |
bool | false |
Insert a register slice on this slave port. |
Address regions must not overlap. The crossbar uses a bitmask decoder: for each slave, bits above log2(size) must match base. Addresses that don't match any slave are undefined (no default slave / error response).
| Mode | Key value | Extra keys | Behavior |
|---|---|---|---|
| Round-robin | round_robin |
— | Equal turns among contending masters. No starvation. Default. |
| Fixed priority | fixed_priority |
— | Master 0 (first listed) has highest priority. Lower-priority masters may starve under sustained load. |
| Weighted round-robin | weighted_round_robin |
weights |
Like round-robin, but master i gets weights[i] consecutive grants before yielding. Example: weights: [3, 1] gives master 0 three turns for every one turn of master 1. |
| QoS-based | qos |
— | Arbitrates on AXI AXQOS[3:0]: higher QoS wins. Equal QoS falls back to round-robin. An aging counter increments for each cycle a request waits; once the age exceeds a threshold, it boosts effective QoS to prevent starvation. |
When a port's data_width differs from fabric_data_width, the generator inserts a converter automatically:
- AXI4-Lite: zero-extends writes to the wider bus, truncates reads to the narrower bus. Single-cycle, no buffering.
- Full AXI4 upsize (narrow port → wider fabric): SpinalHDL
Axi4Upsizer. Assembles narrow beats into wide beats. - Full AXI4 downsize (wide port → narrower fabric):
Axi4DownsizerExt(local fork). Splits wide beats into narrow sub-transactions. INCR bursts stay multi-beat for efficiency. FIXED and WRAP bursts are flattened to single-beat sub-transactions with addresses computed internally.
max_outstanding |
Mode | Behavior |
|---|---|---|
1 |
Blocking | One transaction in flight per slave per direction. No FIFOs. Minimal area. |
> 1 |
Pipelined | Per-slave W-route FIFOs, ID-based B/R response routing. Multiple transactions can be in flight simultaneously to different slaves. Required for high-throughput designs. |
Only affects the Full AXI4 crossbar. The Lite-only crossbar is always single-outstanding (blocking).
Full example with all options: scripts/example.yaml.
Requires Verilator 5.x on Linux or WSL.
sbt test81 tests pass across 14 suites:
| Suite | Tests | Description |
|---|---|---|
LiteCrossbarSpec |
6 | AXI4-Lite crossbar: arbitration, address decode, WRR |
PipelinedCrossbarSpec |
8 | Full AXI4: bursts, back-pressure, outstanding transactions |
MixedCrossbarSpec |
4 | Full↔Lite adapters, mixed address maps |
ArtySpec |
5 | Sequence matching the Arty A7 hardware tests (T4, T5, T6, T9, combined) |
IpifWriteSpec |
5 | IPIF-style slaves (Xilinx GPIO/UART-Lite require AW+W simultaneous), blocking and pipelined modes |
WidthConverterSpec |
6 | Full AXI4 width conversion: 32→64 upsize, 64→32 downsize, 32→64→32 passthrough; single-beat, burst, routing |
BurstTypeSpec |
6 | Downsizer burst types: INCR baseline, FIXED 1-beat and 2-beat overwrite, WRAP aligned, WRAP 4-beat, WRAP with actual wrap-around |
ArbitrationSpec |
7 | FixedPriority and WeightedRoundRobin: contention ordering, throughput proportionality, data integrity |
RegSliceAndLiteWidthSpec |
8 | Register slices (Full + Lite, master/slave/both), AXI4-Lite width conversion (16→32 upsizing) |
PipelinedArbitrationSpec |
9 | Pipelined FixedPriority, WRR, and QoS: contention, concurrent bursts, data integrity |
NarrowPortSpec |
6 | Narrow ports: 32→16 downsizing, 16→32 upsizing, mixed Full+Lite concurrent traffic |
QosCrossbarSpec |
5 | QoS arbitration: higher AWQOS/ARQOS wins (blocking + pipelined), equal-QoS round-robin tie-break, aging anti-starvation |
QosStressShortSpec |
1 | Short 4-master QoS stress: distinct patterns (sequential, reverse, sparse, random short bursts), concurrent traffic, end-state validation |
Axi3ToAxi4Spec |
5 | AXI3→AXI4 bridge: single-beat, INCR burst, write interleaving (WID reorder), locked→SLVERR, multiple outstanding |
Tests the generated Verilog files directly using cocotbext-axi bus functional models.
# requires: pip install cocotb cocotbext-axi
python3 sim/cocotb_gen/run_all.py # all suites
python3 sim/cocotb_gen/run_all.py lite # MyLite_1M4S.v only
python3 sim/cocotb_gen/run_all.py full # MyFull_2M2S.v only
python3 sim/cocotb_gen/run_all.py wrr # MyLite_2M2S_WRR.v only
python3 sim/cocotb_gen/run_all.py qos # MyFull_2M2S_QoS.v only24 tests pass across 4 suites:
| Suite | DUT | Tests | Description |
|---|---|---|---|
lite |
MyLite_1M4S.v |
6 | AxiLiteMaster → 4-slave crossbar: single R/W, address routing, sequential writes, multi-slave pattern, overwrite isolation, 60× random |
full |
MyFull_2M2S.v |
6 | AxiMaster → 2-slave crossbar: single R/W, address routing + isolation, 16-beat burst, 64-beat burst (AWLEN=63), alternating slaves, 40× random |
wrr |
MyLite_2M2S_WRR.v |
6 | 2-master WRR crossbar: dual-master R/W, address routing, concurrent bandwidth, no starvation, concurrent different slaves, 80× random |
qos |
MyFull_2M2S_QoS.v |
6 | 2-master QoS crossbar: dual-master R/W, address routing, higher QoS wins contention, equal-QoS round-robin, aging anti-starvation, QoS read priority |
Four test suites run on a Xilinx Arty A7-100T (xc7a100t) at 100 MHz. All four pass.
Topology: MicroBlaze LE → axiZero 1M×4S → 2× AXI4 BRAM ctrl (64 KB each) + AXI-Lite GPIO + AXI-Lite UART-Lite, max_outstanding=4.
All 10 tests pass (g_fail=0, g_pass=10).
| Test | Description |
|---|---|
| T1–T3 | Single-word write/read, address isolation (AXI4 Full) |
| T4–T6 | 64-word sequential, walking-1, alternating-stride across both BRAMs |
| T7 | GPIO 16-pattern LED sweep (AXI-Lite) |
| T8 | UART-Lite TX FIFO reset and drain (AXI-Lite) |
| T9 | Full 64 KB BRAM checkerboard — 16 384 word write + verify |
| T10 | Cross-slave boundary: last word of BRAM #0, first word of BRAM #1 |
Topology: MicroBlaze + hardware traffic generator → axiZero 2M×4S WRR (weights 3:1) → same slaves as base test.
All 3 tests pass (g_fail=0, g_pass=3).
| Test | Description |
|---|---|
| T1 | Sanity: single-word write/read to both BRAMs |
| T2 | Contention: MB and traffic gen write concurrently, both regions verified |
| T3 | Starvation: lower-weight master still makes progress under sustained load |
Topology: MicroBlaze QoS=15 plus 3 hardware traffic generators (QoS=8/4/0) → axiZero 4M×4S QoS → same slaves as base test. Each generator issues 512 words × 8 passes per iteration with intentionally different patterns:
- G0 (QoS=8): sequential writes to BRAM0
- G1 (QoS=4): reverse-order writes to BRAM1
- G2 (QoS=0): LFSR-based random short bursts (len 1–4) to BRAM1
run_qos_stress_test.py monitors the board continuously for 10 minutes and fails if:
g_failbecomes non-zero,- heartbeat (
g_heartbeat) stops advancing for 30 seconds, - no stress iteration (
g_iteration) completes.
Result: 14 000+ iterations, 70 000+ passes, 0 failures over 10 minutes.
Topology: MicroBlaze (AXI4) → AXI4-to-AXI3 shim → Axi3ToAxi4Adapter → axiZero 1M×4S crossbar → same slaves as base test.
Every MicroBlaze transaction passes through the full AXI3→AXI4 round-trip, proving the adapter's FSM, WID reorder buffer, and field conversion work correctly in real hardware.
All 5 tests pass (g_fail=0, g_pass=5).
| Test | Description |
|---|---|
| T1 | Sanity: single-word write/read to BRAM0 and BRAM1 |
| T2 | Walking-1 pattern across 256 words in BRAM0 |
| T3 | Cross-slave: alternating writes to BRAM0+BRAM1, full verify |
| T4 | GPIO LED sweep (AXI-Lite slave path through adapter) |
| T5 | UART status read (second AXI-Lite slave path) |
All four test runners auto-detect Vivado, xsdb, and mb-gcc by searching PATH and common AMD/Xilinx install locations (Windows and Linux). Override with environment variables if needed:
# Auto-detect (works on Windows and Linux)
python hw/vivado/arty_a7/run_wrr_test.py
python hw/vivado/arty_a7/run_qos_test.py
python hw/vivado/arty_a7/run_qos_stress_test.py
python hw/vivado/arty_a7/run_axi3_test.py
# Override tool paths via env vars
VIVADO_BIN=/opt/Xilinx/2025.2/Vivado/bin/vivado \
XSDB_BIN=/opt/Xilinx/2025.2/Vitis/bin/xsdb \
MBGCC_BIN=/opt/Xilinx/2025.2/Vitis/gnu/microblaze/lin64/bin/mb-gcc \
python hw/vivado/arty_a7/run_qos_stress_test.pyEach runner: (1) creates the Vivado project + bitstream if not already built, (2) compiles MicroBlaze firmware with mb-gcc, (3) programs the FPGA and runs tests via xsdb.
Crossbar-only resource usage (OOC synthesis, xc7a100t):
| Configuration | LUTs | FFs |
|---|---|---|
| Base 1M×4S (pipelined, max_outstanding=4) | 382 | 40 |
| WRR 2M×4S (weighted round-robin, pipelined) | 818 | 92 |
| QoS 2M×4S (QoS arbitration, pipelined) | 1011 | 132 |
| QoS stress 4M×4S (QoS arbitration, pipelined) | 2587 | 208 |
Vivado TCL scripts and MicroBlaze firmware: hw/vivado/arty_a7/ and sw/arty_a7/.
axiZero crossbar
┌──────────────────────────┐
CPU / DMA ────►│ s0_axi_* m0_axi_* ├────► BRAM
│ │
Config port ──►│ s1_axi_* m1_axi_* ├────► GPIO (Lite)
│ m2_axi_* ├────► UART (Lite)
└──────────────────────────┘
sN = slave-facing mN = master-facing
(connect masters here) (connect slaves here)
sN_axi_* are the slave-facing interfaces — connect your AXI masters (CPUs, DMAs) here.
mN_axi_* are the master-facing interfaces — connect your AXI slaves (BRAMs, peripherals) here.
| Signal | Direction | Notes |
|---|---|---|
sN_axi_awvalid/awaddr/awready |
input | write address channel |
sN_axi_wvalid/wdata/wstrb/wready |
input | write data channel |
sN_axi_bvalid/bresp/bready |
output | write response channel |
sN_axi_arvalid/araddr/arready |
input | read address channel |
sN_axi_rvalid/rdata/rresp/rready |
output | read data channel |
sN_axi_awid/wid/bid/arid/rid |
— | Full AXI4 only |
sN_axi_awlen/awsize/awburst/… |
input | Full AXI4 only |
mN_axi_* |
reversed | crossbar drives the master-facing side |
aclk |
input | rising-edge clock |
aresetn |
input | active-low synchronous reset |
Add the generated Verilog to your project sources and instantiate it. All AXI signals are flat wires. This is how the Arty A7 reference design is wired.
Port names match Vivado's AXI naming conventions, so IP Packager infers all interfaces automatically. hw/vivado/package_ip.tcl produces a packaged IP core:
vivado -mode batch -source hw/vivado/package_ip.tcl
# Output: hw/vivado/axizero_ip/ (contains component.xml)To use: IP Settings → IP Repositories → + the hw/vivado/axizero_ip directory, then drag the IP into your block design. To package a different configuration, set RTL_FILE to the desired generated/*.v file and re-run.
hw/quartus/package_ip.tcl generates a _hw.tcl component description that maps all sN_axi_* / mN_axi_* ports to Platform Designer AXI4 or AXI4-Lite interfaces automatically. It parses the Verilog port list, detects Full vs Lite interfaces, and creates the correct clock/reset associations.
# Package the default 2M×2S Full AXI4 crossbar
quartus_sh -t hw/quartus/package_ip.tcl
# Package a different configuration
quartus_sh -t hw/quartus/package_ip.tcl generated/MyLite_1M4S.vOutput: hw/quartus/axizero_ip/ containing <ModuleName>_hw.tcl and the Verilog source.
To use: IP Components > Add Component Search Path → add hw/quartus/axizero_ip/, then drag the component into your Platform Designer system. Clock, reset, and AXI interfaces are pre-mapped.
build.sbt
hw/spinal/axizero/
AxiZeroConfig.scala # configuration model
AxiZeroTop.scala # top-level (Lite-only / Mixed)
crossbar/
Axi4LiteCrossbar.scala # Lite-only path (no ID logic)
Axi4Crossbar.scala # Full AXI4 path (ID expansion, pipelined)
adapters/
Axi4FullToLiteAdapter.scala
Axi4LiteToFullAdapter.scala
RegisterSlice.scala
WidthConverter.scala # Lite and Full AXI4 data-width conversion
Axi4DownsizerExt.scala # fork of SpinalHDL Axi4Downsizer; FIXED/WRAP flattened, INCR multi-beat
Axi3ToAxi4Adapter.scala # AXI3→AXI4 bridge: WID reorder buffer, locked access conversion
gen/
AxiZeroGen.scala # built-in generation entry point
ArtyDutGen.scala # Arty A7 DUT (1M×4S)
ArtyQosDutGen.scala # Arty A7 QoS DUT (2M×4S, QoS arbitration)
ArtyAxi3DutGen.scala # Arty A7 AXI3 adapter DUT (AXI4→AXI3→AXI4→crossbar)
hw/sim/axizero/sim/ # SpinalSim testbenches (sbt test)
sim/cocotb_gen/
run_all.py # Python runner (lite + full suites)
lite/test_lite.py # AxiLiteMaster tests against MyLite_1M4S.v
full/test_full.py # AxiMaster tests against MyFull_2M2S.v
scripts/
axizero.py # YAML → Verilog generator
example.yaml # all configuration options
generated/ # pre-built Verilog
sw/arty_a7/ # MicroBlaze firmware (source + linker script)
hw/quartus/
package_ip.tcl # Platform Designer _hw.tcl generator
hw/vivado/arty_a7/ # Vivado TCL build and test scripts
find_xilinx_tools.py # cross-platform Vivado/xsdb/mb-gcc auto-detection
run_wrr_test.py # WRR HW test runner (build + program + verify)
run_qos_test.py # QoS HW test runner
run_qos_stress_test.py # QoS 10-minute stress test runner
run_axi3_test.py # AXI3 adapter HW test runner
MIT — see LICENSE.