Repository to explore and quantify the heterogeneity in cross-silo federated learing.
There seem to be at least two questions of interest:
- How to measure heterogeneity between silos within a dataset.
- How to compare heterogeneity between datasets.
There are at least two kinds of heterogeneity metrics:
- statistical metrics (based primarily on the datasets)
- optimization metrics (based on quantities that appear throughout the optimization process, once a loss/predictive model has been defined)
The metrics are systematically computed for:
- both iid and non-iid settings
- w.r.t. the centralized client (the client that hold the full dataset) and between clients.
Main classes:
- Client. Attributes: the dataset and its projection, the label, the distributions X, Y, X|Y and Y|X.
- ClientsNetwork. Attributes: all clients, the centralized client.
- Distance. Save the distance between clients and with the central client fo the iid and non-iid case.
- DistanceForSeveralRuns. Attributes: list of distance computed with different dataset splits.
- StatisticalMetrics. Attributes: the distances computed over several runs for each kind of distribution.
git clone https://github.com/philipco/structured_noise.git
conda create -c conda-forge --name FL_heter_env --file requirements.txt python=3.7
Run python3 -m src.main --dataset dataset_name.
Presently, the following dataset are accepted:
- camelyon16
- lidc_idri
- isic2019
- tcga_brca
- heart_disease
- ixi
- kits2019
Create a new branch : git checkout -b features/branch_name and implement all your modification on this branch.
When the feature is ready, create a pull request on Github API, review the code, squash commits and merge on the master branch.
To run the code on a new dataset, just complete the function get_dataset in DataLoading.py.
MIT © Constantin Philippenko