This is a Github repository created to submit the fifth Homework of the Algorithmic Methods for Data Mining (ADM) course for the MSc. in Data Science at the Sapienza University of Rome.
-
README.md: A markdown file that explains the content of the repository. -
main.ipynb: A Jupyter Notebook file containing all the relevant exercises and reports belonging to the homework questions, the Command Line Question, and the Algorithmic Question. -
modules/: A folder including 4 Python modules used to solve the exercises inmain.ipynb. The files included are:-
__init__.py: A init file that allows us to import the modules into our Jupyter Notebook. -
data_handler.py: A Python file including aDataHandlerclass designed to handle data cleaning and feature engineering on Kaggle's Citation Network Dataset. -
backend.py: A Python file including aBackendclass designed to build 5 functionalities to solve the exercises from the homework. -
frontend.py: A Python file including aFrontendclass designed to visualize the 5 functionalities of theBackendto solve the exercises from the homework..
-
-
commandline.sh: A bash script including the code to solve the Command Line Question. -
.gitignore: A predetermined.gitignorefile that tells Git which files or folders to ignore in a Python project. -
LICENSE: A file containing an MIT permissive license.
In this homework we worked with Kaggle's predefined Citation Network Dataset.
If the Notebook doesn't load through Github please try all of these steps:
-
Try compiling the Notebook through its NBViewer.
-
Try downloading the Notebook and opening it in your local computer.
Author: Miguel Angel Sanchez Cortes
Email: sanchezcortes.2049495@studenti.uniroma1.it
MSc. in Data Science, Sapienza University of Rome