Skip to content

hpcgroup/SALT

Repository files navigation

This library is developed based on PCCL. We follow their communication-group construction approach and extend it to develop Alltoallv support.

Preparing the environment

The following will create a python virtual environment with all the required dependencies. It will also build the aws-ofi-plugin. The only thing you need to change in the script is the PROJ_NAME variable on line 4.

bash scripts/Perlmutter/create_python_env.sh

Running

We use the script run_highly_skew.sh and run_lightly_skew.sh to launch benchmarking runs.

Use the --library flag to choose the communication backend - pccl, rccl, mpi

Add the --test flag to validate correctness of PCCL operations. ⚠️ Note: This flag is not recommended for large-scale runs as it introduces performance overhead.

Megatron-LM

If you want to test Megatron-LM performance or extract communication matrices from real workloads, use the materials under Megatron-LM_patch/. See Megatron-LM_patch/README.md for the patch workflow and usage details.

About

Scalable All-to-allv Algorithms for Dynamic and Irregular Communication Patterns

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors