🏆 This was the highest ranking project in its year batch.
This project was developed for the course of Bayesian Statistics for the MSc. in Mathematical Engineering at Politecnico di Milano, A.Y. 2022/2023.
git clone https://github.com/teobucci/bayesian-statistics-project
git submodule update --init
git submodule update --recursiveOpen the ./FGM/FGM.Rproj in RStudio and type:
Ctrl+Shift+Bon WindowsCMD+Shift+Bon macOS
On macOS on M1 chip you may get an error involving gfortran, in which case proceed as follows according to this:
-
Install
gccwhich includesgfortranwithbrew install gcc
-
Create a file
~/.R/Makevars(if it does not exist yet). For example running with a terminalmkdir -p ~/.R touch ~/.R/Makevars
-
Add the following lines to
~/.R/MakevarsFC = /opt/homebrew/Cellar/gcc/11.3.0_2/bin/gfortran F77 = /opt/homebrew/Cellar/gcc/11.3.0_2/bin/gfortran FLIBS = -L/opt/homebrew/Cellar/gcc/11.3.0_2/lib/gcc/11This can be done by opening it in a normal text editor such as VSCode (
code ~/.R/Makevars) or SublimeText (subl ~/.R/Makevars).Note that you might have to change
gccversion11.3.0_2to whatever yourgccversion is.
Install the required packages from CRAN
packages_list <-
c(
"tidyverse",
"mvtnorm",
"salso",
"logr",
"gmp",
"mcclust",
"igraph",
"ggraph",
"tidygraph",
"uuid",
"dittodb",
"latex2exp",
"kableExtra",
"doSNOW",
"doParallel"
)
install.packages(packages_list)and install the custom utilities by Alessandro Colombi and mcclust.ext
devtools::install_github("alessandrocolombi/ACutils")
devtools::install_github("sarawade/mcclust.ext")To compile the presentations, run the following in the root of the repo
make prese1
make prese2
make prese3To compile the report, run
make reportTo compile everything, run
make pdfTo remove temporary LaTeX files, run
make cleanTo remove both temporary and PDF files, run
make distcleanThe repository contains different files to perform the analysis
01_simulations_basic.Rmdis a notebook containing a vanilla implementation for running a single simulation, meant to be used a playground for on-the-go configurations.- The second block of files implements a grid-search approach to run different simulations varying parameters to see how well the MCMC behaves and how robust it is:
02a_simulation_grid_generation.Rgenerates the grid of required configurations.02b_simulation_grid_parallel_execution.Rruns and saves all the simulations from the grid by the previous file, which is sourced here. The execution is run in parallel and on a MacBook Pro M1 14" takes about 10 minutes.02c_simulation_grid_visualization_notebook.Rmdreads all the simulations generated from the previous files and, when knitted, produces a PDF where for each section there is a simulation. At the beginning there is a comprehensive table with all the relevant indexes across the grid.02d_simulation_grid_visualization_export_files.Ris a script that, when sourced, reads all the simulations generated by02b_simulation_grid_parallel_execution.Rand saves all the relevant plots and figures to file, useful for embedding in presentations and report.02e_simulation_grid_kl_comparison.Rmdreads all the simulations generated from the previous files and, when knitted, produces a PDF comparing the evolution of the KL distance across iterations for different configurations. For a better understanding, it is advised to run the simulations without burn-in in this case.
03_simulations_real_dataset.Rmdis a notebook where the algorithm is run on a real dataset, which is meant to be stored indataset, not included in this repository for privacy reasons. It is essentially a copy of01_simulations_basic.Rmdbut without the knowledge of the true graph and partition.04_execution_time_regression.Ris a script that implements a polynomial linear regression of the execution time against the number of nodes, taken from the simulation grid results.
The final presentations can be found here:
The final report can be found here:
The results from the simulations knitted can be found here:
01_simulations_basic.pdf02c_simulation_grid_visualization_notebook.pdf02e_simulation_grid_kl_comparison.pdf03_analysis_real_dataset.pdf
Supervisor: Alessandro Colombi (@alessandrocolombi)
- Teo Bucci (@teobucci)
- Filippo Cipriani (@SmearyTundra)
- Filippo Pagella (@effefpi2)
- Flavia Petruso (@fl-hi1)
- Andrea Puricelli (@apuri99)
- Giulio Venturini (@Vinavil334)