- Goal: Implement and evaluate Sparse Matrix–Vector multiplication (y = A·x) using MPI with Compressed Sparse Row (CSR) storage.
- Matrix structure: Symmetric, 11-banded (main diagonal ±5).
- Focus: Wall-clock time, speed-up, efficiency, and effective metrics; separation of computation vs communication costs.
- The program logs per-run metrics (e.g.,
results.txt) for different matrix sizes and core counts.
- CSR Construction
fillRowStart– row pointers (row_start)fillColIdx– column indices within ±5 band (col_idx)fillValues– nonzeros (keeps symmetry)
- SpMV Kernel
SpMV_Mult(B, C, col_idx, row_start, values, local_rows)- For each row
i: accumulatevalues[j] * B[col_idx[j]]overj ∈ [row_start[i], row_start[i+1])
- MPI Parallelization (row partitioning)
- Nearly even row-wise distribution across ranks
- Data movement with
MPI_Scatterv/MPI_Gatherv - Measures total, computation, and communication times
- Recording
- Appends matrix size, core count, and timing breakdowns to a results file
- CPU: Intel Xeon E5-2680 v4 (28 cores/processor)
- Memory: 126 GB
- Caches: L1 32 KB, L2 256 KB, L3 ~35 MB
- Network: InfiniBand (high bandwidth; comms latency impacts scaling)
- Toolchain: OpenMPI/MPICH, build flags like
-O3 -march=native -DNDEBUG
Note: Very large matrices + high core counts may hit memory limits; communication cost becomes dominant as processes increase.
- Speed-up: Increases with core count but shows diminishing returns due to communication and synchronization overheads.
- Efficiency: High at low–mid core counts; decreases as communication dominates at larger scales.
- Problem size effect: Larger
Nimproves compute/communication ratio, aiding scalability; memory bandwidth/patterns remain limiting. - Overall: Row-wise CSR partitioning is simple and effective; best results typically appear at mid-to-high core counts before communication bottlenecks dominate.