Distributed Δ-Stepping Algorithm

This project implements the Δ-stepping algorithm, a high performance, distributed-memory parallel algorithm for solving the Single Source Shortest Paths (SSSP) problem on large scale graphs.

The Δ-stepping algorithm combines the work efficiency of Dijkstra’s algorithm with the parallelism of Bellman-Ford, making it well suited for distributed systems. The algorithm processes vertices in buckets based on their tentative distances, allowing parallel relaxation of edges within each bucket while maintaining correctness. This approach enables efficient weak (Gustafson) scaling on parallel systems.

This implementation is written in C++ and uses MPI for parallelism. It's based on the algorithm and optimizations described in “Scalable Single Source Shortest Path Algorithms for Massively Parallel Systems” by V. T. Chakaravarthy, F. Checconi, F. Petrini, and Y. Sabharwal, Proceedings of the IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2014. [PDF].

Report

A detailed report describing the implementation specifics of the algorithm, the impact of various optimizations and benchmark results for multiple graph sizes can be found in report.pdf

Optimizations

This implementation includes several optimizations from the referenced paper, plus additional improvements.

Edge Classification

Edges are divided into short and long groups based on a threshold parameter Δ.

Short edges have weights <Δ and are processed multiple times within a bucket.
Long edges have weights ≥Δ and are relaxed only once per bucket.

This reduces redundant relaxations, since long edges cannot produce tentative distances that fall into the current bucket’s range.

Inner-Outer Short (IOS) Heuristic

During the short edge phase, an edge is relaxed only if the new tentative distance places the target vertex within the current bucket.

Inner short edges satisfy this condition and are processed immediately.
Outer short edges are processed in the long edge phase

This avoids unnecessary relaxations on edges that cannot affect the current bucket.

Push and Pull Communication in Long Edge Phase

In push mode currently settled vertices send updated distances to their neighbors
In pull mode unsettled vertices request distances from their neighbors

For each bucket, the optimal communication mode is chosen by estimating the total push/pull communciation volume using a heuristic.

Hybridization with Bellman-Ford

After a specified fraction of vertices is settled, the algorithm switches to Bellman-Ford for all remaining buckets, which are more sparsely occupied. This reduces the number of long edge phases which require synchronization, while keeping similar total work.

Dynamic Index Width (custom optimization)

When exchanging distances, vertex indices are encoded relative to the first owned vertex of the receiver. Because of this, if each process owns ≤ 2¹⁶ vertices, the relative index can be represented as a 16-bit number instead of a 32-bit one.

Requirements

MPI: OpenMPI or MPICH
C++ Compiler: C++17 support
OpenMP: For intra-node parallelism
Python 3: To run the checker (optional)

Usage

Building

git clone https://github.com/tanshoo/delta-stepping.git
cd delta-stepping
make

Running the Algorithm

Each process <rank> reads its input from <input_dir>/$RANK.in and writes output to <output_dir>/$RANK.out.

The output directory must exist before running.

Running locally

mpiexec -n <num_processes> bash -c 'RANK=${OMPI_COMM_WORLD_LOCAL_RANK}; ./sssp <input_filepath>/$RANK.in <output_filepath>/$RANK.out'

Running on SLURM

srun -n <num_processes> bash -c 'RANK=${SLURM_PROCID}; ./sssp <input_filepath>/$RANK.in <output_filepath>/$RANK.out'

Input/Output Format

Input Format

Each MPI rank receives a file with:

First line: n first last

n: Total number of vertices in the graph
first: Index of the first vertex owned by this process
last: Index of the last vertex owned by this process
Subsequent lines: u v w
- Edge from u to v with weight w

Output Format

Each process writes a file with the shortest distances for its vertices, one per line, sorted by vertex index.

Input Assumptions

The graph is undirected, with no self-loops or duplicate edges.
All edge weights are non-negative.
The source vertex is always 0.
Each process owns a contiguous range of vertices.

Utilities

Build all utilities:

cd utils
make

R-MAT graph generator:

./rmat_generator <scale> <edge_factor> <rmat1|rmat2> <seed> [output_file]

Output: Edgelist in format u v w (default filename: <rmat_type>_<scale>_<seed>.edgelist) with w in [0,255]

Sequential Dijkstra (reference):

./dijkstra <basename>

Input: <basename>.edgelist
Output: <basename>.out (distances from node 0)

Distributed testcase generator:

Splits an edgelist and output into per process files for testing.

./test_generator <basename> <num_processes>

Input: <basename>.edgelist and <basename>.out
Output: Directory <basename>_<num_processes> with files 0.in, ..., N-1.in and 0.out, ..., N-1.out

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
utils		utils
Makefile		Makefile
README.md		README.md
common.h		common.h
delta_stepping.cpp		delta_stepping.cpp
delta_stepping.h		delta_stepping.h
main.cpp		main.cpp
push_pull_comm.cpp		push_pull_comm.cpp
push_pull_comm.h		push_pull_comm.h
report.pdf		report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Distributed Δ-Stepping Algorithm

Table of Contents

Report

Optimizations

Edge Classification

Inner-Outer Short (IOS) Heuristic

Push and Pull Communication in Long Edge Phase

Hybridization with Bellman-Ford

Dynamic Index Width (custom optimization)

Requirements

Usage

Building

Running the Algorithm

Running locally

Running on SLURM

Input/Output Format

Input Format

Output Format

Input Assumptions

Utilities

About

Uh oh!

Releases

Packages

Languages

tanshoo/delta-stepping

Folders and files

Latest commit

History

Repository files navigation

Distributed Δ-Stepping Algorithm

Table of Contents

Report

Optimizations

Edge Classification

Inner-Outer Short (IOS) Heuristic

Push and Pull Communication in Long Edge Phase

Hybridization with Bellman-Ford

Dynamic Index Width (custom optimization)

Requirements

Usage

Building

Running the Algorithm

Running locally

Running on SLURM

Input/Output Format

Input Format

Output Format

Input Assumptions

Utilities

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages