This artifact builds a reproducible Roofline workflow for FP32 GEMM:
- Calibrate sustained memory bandwidth (STREAM Triad) and compute throughput (OpenBLAS SGEMM).
- Run two GEMM kernels (naive baseline and cache-blocked).
- Collect FLOPs, bytes, OI, GFLOP/s to CSV.
- Plot a Roofline figure using your measured ceilings.
- Validate with unit tests and Valgrind memcheck.
- Kernels:
gemm_baseline(i–j–k),gemm_blocked(tiled MB×NB×KB) - Calibration tools: STREAM Triad (C, OpenMP) and an SGEMM peak timer
- Data & plots: CSV logs and a Roofline chart (OI vs GFLOP/s, log–log)
- Tests & hygiene: unit tests, Valgrind memcheck, stable OMP settings
- Ubuntu 24.04 (tested on a 4-vCPU, 8 GB RAM VM)
- GCC / OpenMP / OpenBLAS
- Python 3 with Matplotlib + Pandas
jq(parse JSON),awk(parsing logs)- Valgrind
Install prerequisites:
sudo apt update
sudo apt install -y build-essential libopenblas-dev valgrind jq \
python3 python3-matplotlib python3-pandasSet up Environment settings
export OMP_NUM_THREADS=4
export OMP_PROC_BIND=close
export OMP_PLACES=coresClean build folder
make cleanClean result folder
make distcleanMemory roof B (GB/s) with STREAM Triad, STREAM_ARRAY_SIZE=50,000,000 by default, NTIMES=20 iterations
make calibrate-memCompute roof F (GFLOP/s) with an SGEMM peak timer
make sgemm_peak
make calibrate-compCheck the combined file
cat results/calibration.jsonBuild kernels
make baseline
make blocked MB=96 NB=96 KB=256 # you can tune block sizeRun and collect the results (This will take a while)
./scripts/collect.sh baseline 1024 7
./scripts/collect.sh blocked 1024 7 # n, Trial
./scripts/collect.sh baseline 2048 7
./scripts/collect.sh blocked 2048 7Plot the roofline graph (There's unit issue in the paper, should've divide by 1000 in the paper)
make roofline # -> results/roofline.pngUnit tests
make testLarge tests
make perf-large N=2048 TRIALS=5 THREADS=4 # -> results/large_runs.csvValgrind tests
make memcheck-test # unit tests under valgrind
make memcheck-baseline # n=128, baseline kernel
make memcheck-blocked # n=128, blocked kernel
make memcheck-sgemm # sgemm_peak sanity