This library is developed based on PCCL. We follow their communication-group construction approach and extend it to develop Alltoallv support.
The following will create a python virtual environment with all the required dependencies. It will also build the aws-ofi-plugin. The only thing you need
to change in the script is the PROJ_NAME variable on line 4.
bash scripts/Perlmutter/create_python_env.shWe use the script run_highly_skew.sh and run_lightly_skew.sh to launch benchmarking runs.
Use the --library flag to choose the communication backend - pccl, rccl, mpi
Add the --test flag to validate correctness of PCCL operations.
If you want to test Megatron-LM performance or extract communication matrices from real workloads, use the materials under Megatron-LM_patch/. See Megatron-LM_patch/README.md for the patch workflow and usage details.