ML-F23-Project

This repository contains code for Parts I and II of our Machine Learning Final project and for our implmentation of the bonus part.

Part I Data Preprocessing

The code we use for preprocessing our data is found in multiple files. First, we split our data into training and testing sets and write each set to a csv file in the file create_train_val_test_split.py. We encountered an issue with running dataset 2 on our models due to one of its features being in string form. Since the feature was binary (present/absent) we replace "present" and "absent" with the integer values 1 and 0, respectively. Code for this step can be found in convert_data2_str_features_to_int.py. Finally, we write the functions get_data() and process_data() in the file run_all_models.py to read the data from the csv files and format it to be input into our models.

To run PCA and T-SNE analysis, please refer to the two file in preprocessing folder. The utils.py will contain all the functions that was used to generate the plots in our report. The name of it function explains it purpose. The file dataViz.py are used as example usage.

How to Run Our Code for Part I

In Part I, we run Logistic Regression, Boosting, KNN, Decision Trees, Random Forest, and SVM. Our process consists of the following steps:

For each model, we establish a set of hyperparameters to test and run 10-fold cross-validation on each model using the training data for each dataset. We record the best classifier found for each model type. The code for these first two steps can be found in the file run_all_models.py. The best models found for each dataset can be found in the folder output_metrics.
Next, we take the lists of parameters we eatsblished in step 1 and modify them as needed based on our results from cross-validation. Then, for each type of model, we train classifiers with all combinations of the hypoerparameters on all the training data and test on the test data. We measure the performance of each model with respect to the different hyperparameters considered to analyze how each hyperparameter contributes to the bias and variance of each mode. Code for this step can be found in the file bias_variance_evaluation.py. Outputs from this step can be found in the folder bias_variance_comparisons.
Finally, we plot the train and test losses with respect to each hyperparameter considered for each model. Code for generating these plots can be found in the file plot_results.py. Outputs from this step can be found in the folder plots.

How to Run Our Code Part II

In Part II, we run Multilayer Perceptron. All the codes could be located inside neural folder. There are two files:

utils.py: all necessary function
main.py: code execution

There is no step for processing except some manipulation to transpose and transform data into pytorch tensor.

To train model, use function train_model() which would require predefined set of parameter [model, epochs, loss_fun, optimizer, X_train, y_train]
To evaluate model on the training set, use function eval_model_train(), which would require a trained model, X_test, and y_test
To evaluate model on the testing set, use function eval_model(), which would require a trained model, X_test, y_test
To run parameter tuning, use function unit_optimize() which require predefinied set of parameter [epochs, loss_fun, X_train, y_train, X_test, y_test, unit1_range, unit2_range, lr_range]

To regenrate plot, please refer to plots.Rmd. All the plots and analysis in this part was done using R.

How to Run code for Extra Credit

To run extra credit and perform an attack on reconstruction process, please refer to xtracred folder. The file utils.py include the class to generate a deep autoencoder that was created under the help of this tutorial. At the same time, utils.py also contain methods to calculate frobenius_norm(), solver() for optimization described in the project report.

The main.py file contain the content from the same GeeksForGeeks tutorial of how to train an autoencoder. The rest of the file are a demonstration of how everything was done to create an attack. Please refer to the comment in the file as instruction.

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
__pycache__		__pycache__
bias_variance_comparisons		bias_variance_comparisons
etc		etc
neural		neural
output_metrics		output_metrics
plots		plots
preprocessing		preprocessing
xtracred		xtracred
README.md		README.md
bias_variance_evaluation.py		bias_variance_evaluation.py
convert_data2_str_features_to_int.py		convert_data2_str_features_to_int.py
create_train_val_test_split.py		create_train_val_test_split.py
data1.csv		data1.csv
data2.csv		data2.csv
data2_with_all_numerical_features.csv		data2_with_all_numerical_features.csv
dataset1_testing_data.csv		dataset1_testing_data.csv
dataset1_training_data.csv		dataset1_training_data.csv
dataset1_validation_data.csv		dataset1_validation_data.csv
dataset2_testing_data.csv		dataset2_testing_data.csv
dataset2_training_data.csv		dataset2_training_data.csv
dataset2_validation_data.csv		dataset2_validation_data.csv
plot_results.py		plot_results.py
project3_dataset1.txt		project3_dataset1.txt
project3_dataset2.txt		project3_dataset2.txt
run_all_models.py		run_all_models.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML-F23-Project

Part I Data Preprocessing

How to Run Our Code for Part I

How to Run Our Code Part II

How to Run code for Extra Credit

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

HVKHCM/ML-F23-Project

Folders and files

Latest commit

History

Repository files navigation

ML-F23-Project

Part I Data Preprocessing

How to Run Our Code for Part I

How to Run Our Code Part II

How to Run code for Extra Credit

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages