🌳 Decision Tree Classification – Pre-Pruning vs Post-Pruning

A machine learning project that compares Decision Tree Classification using Pre-Pruning and Post-Pruning techniques, highlighting their impact on model complexity, overfitting, and classification performance.

📌 Project Overview

This project explores two different pruning strategies in Decision Tree Classification:

Pre-Pruning: Restricting tree growth during training
Post-Pruning: Allowing full growth and pruning the tree afterward

Both approaches are implemented, evaluated, and visually compared using classification reports and decision tree plots.

📁 Project Structure

decission_treepre.ipynb — Decision Tree with pre-pruning
decission_treepost.ipynb — Decision Tree with post-pruning
pre_pruning_report.png — Classification report (accuracy: 0.88)
pre_pruning_tree.png — Pre-pruned decision tree visualization
post_pruning_report.png — Classification report (accuracy: 0.98)
post_pruning_tree.png — Post-pruned decision tree visualization
README.md — Project documentation

⚙️ Technologies Used

Python
NumPy
Pandas
Matplotlib
scikit-learn
Jupyter Notebook

🧠 Model Details

Algorithm: Decision Tree Classifier
Problem Type: Multiclass Classification
Splitting Criteria: Entropy (Pre-Pruning), Gini (Post-Pruning)
Evaluation Metrics: Precision, Recall, F1-Score, Accuracy

📊 Pre-Pruning Results

Accuracy: 0.88
Controlled tree depth during training
Reduced model complexity
Slight loss in predictive performance

Pre-Pruning Classification Report

Pre-Pruning Tree Visualization

📊 Post-Pruning Results

Accuracy: 0.98
Tree fully grown before pruning
Better generalization and higher accuracy
More balanced class predictions

Post-Pruning Classification Report

Post-Pruning Tree Visualization

🔍 Comparison Summary

Technique	Accuracy	Tree Complexity	Performance
Pre-Pruning	0.88	Lower	Moderate
Post-Pruning	0.98	Optimized	High

Post-pruning clearly outperforms pre-pruning by achieving higher accuracy while maintaining controlled complexity.

▶️ How to Run

Clone the repository

git clone https://github.com/btboilerplate/Decisiontree_Classicication.git

Install required libraries

pip install numpy pandas matplotlib scikit-learn

Open and run notebooks

decission_treepre.ipynb
decission_treepost.ipynb

🧪 Key Observations

Pre-pruning prevents overfitting but may underfit
Post-pruning provides better balance between bias and variance
Tree visualizations help interpret decision boundaries
Pruning strategy has a major impact on classification performance

🚀 Future Improvements

Add cross-validation comparison
Tune pruning hyperparameters automatically
Compare with Random Forest and Gradient Boosting
Evaluate on larger datasets

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🌳 Decision Tree Classification – Pre-Pruning vs Post-Pruning

📌 Project Overview

📁 Project Structure

⚙️ Technologies Used

🧠 Model Details