We present here the code of the experimental parts of the following paper:
Constantin Philippenko, Kevin Scaman and Laurent Massoulié, In-depth Analysis of Low-rank Matrix Factorisation in a
Federated Setting, Proceedings of the AAAI Conference on Artificial Intelligence 2025.
In this paper, we analyze a distributed algorithm to compute a low-rank matrix factorization on
From our analysis, several take-aways can be identified.
- Increasing the number of communication
$\alpha$ leads to reduce the error$\epsilon$ by a factor$\sigma_{r_*+1}^{4\alpha}/ \sigma_{r_*}^{4\alpha}$ , therefore, getting closer to the minimal Frobenius-norm error$\epsilon$ . - Using a gradient descent instead of an SVD to approximate the exact solution of the strongly-convex problem allows us to bridge two parallel lines of research. Further, we obtain a simple and elegant proof of convergence and all the theory from optimization can be plugged in.
- By sampling several Gaussian matrix
$\Phi$ , we improve the rate of convergence of the gradient descent. Further, based on random Gaussian matrix theory, it results in an almost surely convergence if we sample$\Phi$ until$\mathbf{V}$ is well conditioned.
Run the following commands to generate the illustrative figures in the article.
Condition number on the X-axis, the logarithm of the loss F after 1000 local iterations on the Y-axis.
Goal: illustrate the impact of the sampled Gaussian matrices Phi on the convergence rate.
Left: without noise. Right: with noise.
python3 -m src.plotter_script.PlotErrorVSCond
Iteration index on the X-axis, the logarithm of the loss F after 1000 local iterations on the Y-axis.
Goal: illustrate on real-life datasets how the algorithm behaves in practice.
Left to right: (a) synthethic dataset, (b) w8a, (c) mnist and (d) celeba.
python3 -m src.plotter_script.PlotRealDatasets --dataset_name synth
python3 -m src.plotter_script.PlotRealDatasets --dataset_name mnist
python3 -m src.plotter_script.PlotRealDatasets --dataset_name celeba
python3 -m src.plotter_script.PlotRealDatasets --dataset_name w8a
We use three real datasets: mnist, celeba and w8a that should be stored at this location ~/GITHUB/DATASETS.
Mnist is automatically downloaded if it not present. The user should download w8a here and celeba from Kaggle
Using pip:
pip install -c conda-forge -r requirements.txt python=3.7.
Or to create a conda environment: conda create -c conda-forge --name matrix_factorisation --file requirements.txt python=3.7.
MIT © Constantin Philippenko
If you use this code, please cite the following papers
@article{philippenko2024indepth,
title={In-depth Analysis of Low-rank Matrix Factorisation in a Federated Setting},
author={Philippenko, Constantin and Scaman, Kevin and Massoulié, Laurent},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2025}
}





