Skip to content

wangt0815/HeaPS

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HeaPS

Official PyTorch implementation of HeaPS published in Journal of Parallel and Distributed Computing. [Paper]

Federated learning enables collaborative model training among numerous clients. However, existing participant/client selection methods fail to fully leverage the advantages of clients with excellent computational or communication capabilities. In this paper, we propose HeaPS, a novel Heterogeneity-aware Participant Selection framework for efficient federated learning. We introduce a finer-grained global selection algorithm to select communication-strong leaders and computation-strong members from candidate clients. The leaders are responsible for communicating with the server to reduce per-round duration, as well as contributing gradients; while the members communicate with the leaders to contribute more gradients obtained from high-utility data to the global model and improve the final model accuracy. Meanwhile, we develop a gradient migration path generation algorithm to match the optimal leader for each member. We also design the client scheduler to facilitate parallel local training of leaders and members based on gradient migration. Experimental results show that, in comparison with state-of-the-art methods, HeaPS achieves a speedup of up to 3.20× in time-to-accuracy performance and improves the final accuracy by up to 3.57%.

We use Intel Xeon CPU containing a clock rate of 3.0 GHz with 32 cores and utilize 8 Nvidia Tesla V100 GPUs to accelerate training. The OS system is Ubuntu18.04. The driver version is 440.118.02 and CUDA version is 10.2. For the base settings, K=50 clients are selected to participant in each round of training from 1.3K=60 clients.

Quick Start

Installation

conda create -n yourname python=3.8
conda activate yourname
conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit==10.2

Find your own install command in the official website of PyTorch: https://pytorch.org/get-started/previous-versions/.
The versions and installation methods of the following packages can be found in NVIDIA official website. Note that the versions of packages should correspond to each other.
CUDA Toolkit: https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html
cuDNN: https://docs.nvidia.com/deeplearning/cudnn/archives/cudnn-870/install-guide/index.html
NCCL: https://developer.nvidia.com/nccl/nccl-legacy-downloads

Cloning

git clone https://github.com/Dora233/HeaPS

To compare with the existing works, Oort and PyramidFL, run the following commands to install Oort.

git clone https://github.com/SymbioticLab/Oort
cd Oort
source install.sh

Dataset Preparation

We use three public datasets of varying scales: Google Speech, OpenImage, and StackOverflow. They can be download from the AI benchmark FedScale .

Run Simulation

HeaPS can be tested with training ResNet-34 on non-IID Google Speech by runing the following commands to submit the task:

cd {root}/HeaPS/training/evals
python manager.py submit configs/speech/conf_heaps.yml

All the configuration files are in ".../HeaPS/training/evals/configs/".
Among them, the suffix "_p" represents using Prox, while without "_p" represents using Yogi.
The suffix "_nomember" represents the HeaPS without member clients used in ablation study.
The other variant used in ablation study is HeaPS without the fine-grained utility. To test it, enter ".../HeaPS/heaps/" and use the content in heaps_util.py to replace the content in heaps.py.
The generated log files are in ".../HeaPS/training/evals/logs/".

Acknowledgements

Thanks to Chenning Li, Xiao Zeng, Mi Zhang, Zhichao Cao for their MobiCom'22 paper PyramidFL: a fine-grained client selection framework for efficient federated learning. The source codes can be found in repo PyramidFL.

We also appreciate the help from Fan Lai, Xiangfeng Zhu, Harsha V. Madhyastha, and Mosharaf Chowdhury for their OSDI'21 paper Oort: Efficient Federated Learning via Guided Participant Selection. The source codes can be found in repo Oort.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%