Implementation and Analysis of Three Supervised Machine Learning Methods

Author: Audrey Liang

Abstract

This paper is a partial implementation of a 2006 paper by Caruana and Niculescu-Mizil (Caruana & Niculescu-Mizil, 2006). It is implemented as a part of the final project for the course COGS 118: Supervised Machine Learning Algorithms, taughtby Professor Tu Zhuowen. The paper will look at three different classification algorithms, performed using three different splits, and on three different datasets taken from the UC Irvine Machine Learning Repository.

Introduction

Classification algorithms are extremely crucial in machine learning, where the intent is to classify and find patterns in places where we might not be able to discern using the human eye. Classification has historically been a crucial pillar of machine learning, since it is simply human nature to find patterns in everything.

This paper aims to implement three classification algorithms to evaluate their performance on binary classification tasks: Random Forest, Decision Tree, and Support Vector Machine (SVM). A full breakdown of each classification will be provided in Section 2: Methodology.

This was conducted as part of a final assignment to gain experience with implementing classification algorithms on real-world datasets, taken from the UC Irvine Machine Learning Repository, which is widely used in machine learning research. The datasets were ensured to have no missing values, and they are adjusted for binary classification. In order to show representative performance, cross-validation was performed accross three different partitions.

To comply with course requirements, these implementations were taken by Caruana et.al’s 2006 paper (Caruana & Niculescu-Mizil, 2006) on evaluating performances of different algorithms. Each algorithm was trained under three test partitions: 80/20, 20/80, and 50/50, using a averaged accuracy amongst the three splits to determine performance evaluation.

The primary objectives of this paper are:

• To implement and evaluate the performance of Random Forest Classification, Support Vector Machine Classification, and Decision Tree Classification Algorithms.

• To analyze the impact that a dataset has (Data size, instances, features, etc.) on model performance

• To compare said results from the other two objectives with the results found in the original paper (Caruana & Niculescu-Mizil, 2006)

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
README.md		README.md
SVM.ipynb		SVM.ipynb
decision_tree.ipynb		decision_tree.ipynb
final_paper.pdf		final_paper.pdf
random_forest.ipynb		random_forest.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Implementation and Analysis of Three Supervised Machine Learning Methods

Author: Audrey Liang

Abstract

Introduction

About

Uh oh!

Releases

Packages

Languages

oodball/ML-implementation

Folders and files

Latest commit

History

Repository files navigation

Implementation and Analysis of Three Supervised Machine Learning Methods

Author: Audrey Liang

Abstract

Introduction

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages