This project implements four popular clustering algorithms from scratch in Python, designed to work for datasets with d >= 2 dimensions and k >= 2 clusters. The implementations are tested on 2D datasets and compared visually with scikit-learn's implementations to evaluate correctness and performance.
- K-Means Clustering
- Gaussian Mixture Model (GMM) using Expectation-Maximization (EM)
- Mean-Shift Clustering
- Agglomerative Clustering
- KMeans.py: K-Means clustering.
- KMeans_Ver0.py: K-Means clustering (2nd version).
- GaussianMM.py: EM-GMM.
- GaussianMM_Ver0.py: EM-GMM with functions of AIC, BIC and predict (2nd version).
- MeanShift.py: Mean-Shift clustering.
- Agglomerative.py: Agglomerative clustering.
- test_2d_visualization.py:
 Tests each implementation on 2D datasets with visualization, comparing the results to scikit-learn's equivalent algorithms.
- data_2d_test/:
 Contains the datasets used for testing.
- test_2d_visualization_results/:
 Stores the output images of the clustering results.
| Algorithm | My Implementation | Scikit-learn | 
|---|---|---|
| Agglomerative |  |  | 
| EM-GMM |  |  | 
| K-Means |  |  | 
| Mean-Shift |  |  | 
| Algorithm | My Implementation | Scikit-learn | 
|---|---|---|
| Agglomerative |  |  | 
| EM-GMM |  |  | 
| K-Means |  |  | 
| Mean-Shift |  |  | 
| Algorithm | My Implementation | Scikit-learn | 
|---|---|---|
| Agglomerative |  |  | 
| EM-GMM |  |  | 
| K-Means |  |  | 
| Mean-Shift |  |  |