Skip to content

berna14y/Parallel-Sparse-Matrix-Vector-Multiplication-MPI-CSR-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Parallel-Sparse-Matrix-Vector-Multiplication-MPI-CSR-

Project Overview

  • Goal: Implement and evaluate Sparse Matrix–Vector multiplication (y = A·x) using MPI with Compressed Sparse Row (CSR) storage.
  • Matrix structure: Symmetric, 11-banded (main diagonal ±5).
  • Focus: Wall-clock time, speed-up, efficiency, and effective metrics; separation of computation vs communication costs.
  • The program logs per-run metrics (e.g., results.txt) for different matrix sizes and core counts.

Implemented Sections

  • CSR Construction
    • fillRowStart – row pointers (row_start)
    • fillColIdx – column indices within ±5 band (col_idx)
    • fillValues – nonzeros (keeps symmetry)
  • SpMV Kernel
    • SpMV_Mult(B, C, col_idx, row_start, values, local_rows)
    • For each row i: accumulate values[j] * B[col_idx[j]] over j ∈ [row_start[i], row_start[i+1])
  • MPI Parallelization (row partitioning)
    • Nearly even row-wise distribution across ranks
    • Data movement with MPI_Scatterv / MPI_Gatherv
    • Measures total, computation, and communication times
  • Recording
    • Appends matrix size, core count, and timing breakdowns to a results file

Hardware Environment

  • CPU: Intel Xeon E5-2680 v4 (28 cores/processor)
  • Memory: 126 GB
  • Caches: L1 32 KB, L2 256 KB, L3 ~35 MB
  • Network: InfiniBand (high bandwidth; comms latency impacts scaling)
  • Toolchain: OpenMPI/MPICH, build flags like -O3 -march=native -DNDEBUG

Note: Very large matrices + high core counts may hit memory limits; communication cost becomes dominant as processes increase.


Performance Summary

  • Speed-up: Increases with core count but shows diminishing returns due to communication and synchronization overheads.
  • Efficiency: High at low–mid core counts; decreases as communication dominates at larger scales.
  • Problem size effect: Larger N improves compute/communication ratio, aiding scalability; memory bandwidth/patterns remain limiting.
  • Overall: Row-wise CSR partitioning is simple and effective; best results typically appear at mid-to-high core counts before communication bottlenecks dominate.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published