Skip to content

juliaxchen/K-Closest-Clusters

Repository files navigation

K-Closest Clusters

Overview

We wanted to lessen the limitations of popular machine learning models that classify quantitative data: K-Nearest Neighbors and K-Means in particular. In order to do so, our approach included combining aspects of the two by taking the clustering aspect of K-Means and the classification based on the nearest neighbors aspect of KNN. Our methodology was composed of creating a k number of clusters within a dataset and appointing the centroid of each cluster with the majority class label of that cluster. Then, to classify, we implemented KNN with k=1, considering each centroid as a neighbor. This algorithm is known as K-Closest Clusters (KCC). The results show that KCC achieved a slightly better accuracy, precision and recall than the existing KNN algorithm and that KCC is significantly more efficient at classifying test instances than KNN. This indicates that KCC is more practical than KNN as a classifier, especially when used for large datasets.

Running the Code

Download the ml_q2_project.py file and run the file. This file contains our implementation of the K-Closest Clusters algorithm described in our report. A graph of the validation accuracies, a visualization of the clusters, the test accuracy, and the classification time will be displayed after running the code.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages