🩷Breast Cancer Prediction using Machine Learning

📌 Project Overview

This project aims to develop a machine learning model that predicts whether a tumor is malignant or benign based on medical features. The dataset used is the Breast Cancer Wisconsin Dataset from sklearn.datasets.

🛠️ Steps in the Project

1️⃣ Data Loading & Preparation

Loaded the Breast Cancer Wisconsin Dataset and converted it into a pandas.DataFrame.
Separated features (x) and the target variable (y).

2️⃣ Exploratory Data Analysis (EDA)

Checked dataset dimensions (569 samples, 30 features).
Verified missing values (none found).
Used describe() to analyze key statistics such as mean and standard deviation.

3️⃣ Splitting Data into Training & Testing Sets

Used train_test_split() to split data into 80% training and 20% testing.

4️⃣ Training a Random Forest Classifier

Trained an initial Random Forest Model (RandomForestClassifier) with default hyperparameters.
Initial Accuracy: 96.49%

5️⃣ Comparing Different Models

Tested Logistic Regression and SVC (Support Vector Classifier) alongside Random Forest.
Results:
- Random Forest: 96.49%
- Logistic Regression: 95.61%
- SVC: 94.74%

6️⃣ Feature Importance Analysis

Identified key features contributing to the prediction.
Most important features:
- Concave points (worst)
- Area (worst)
- Radius (worst)
- Concave points (mean)

7️⃣ Hyperparameter Tuning with GridSearchCV

Used GridSearchCV to find the best Random Forest parameters:
- n_estimators = 150
- max_depth = None
Optimized Accuracy: 96.26%

8️⃣ Code Optimization with Classes & Functions

Refactored the code into a structured class (BreastCancerPrediction) for better readability and maintainability.

📌 Key Results

✅ The trained Random Forest model achieves an accuracy of ~96%, making it highly reliable for breast cancer classification.
✅ Feature Importance Analysis helped identify the most relevant medical features.
✅ Hyperparameter Tuning slightly improved the model’s performance.

🚀 Future Enhancements

User Input Feature: Allow users to enter their own data and get predictions.
Further Model Optimization: Try deep learning (e.g., neural networks) for comparison.
Apply to Other Medical Datasets to test generalization.

📂 Installation & Usage

To run this project locally, follow these steps:

🔹 Install dependencies

pip install pandas scikit-learn matplotlib

🔹 Run the script

python breast_cancer_prediction.py

Dataset Information

The dataset used in this project is the Breast Cancer Wisconsin Dataset, available in sklearn.datasets. It contains:

569 samples
30 numerical features
Binary target variable (0 = malignant, 1 = benign)

For more details, visit the Breast Cancer Dataset Documentation

🤝 Contributing

Feel free to contribute by:

Improving the model
Adding a web interface
Exploring different machine learning techniques

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
.replit		.replit
README.md		README.md
breast_cancer_ml.py		breast_cancer_ml.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🩷Breast Cancer Prediction using Machine Learning

📌 Project Overview

🛠️ Steps in the Project

1️⃣ Data Loading & Preparation

2️⃣ Exploratory Data Analysis (EDA)

3️⃣ Splitting Data into Training & Testing Sets

4️⃣ Training a Random Forest Classifier

5️⃣ Comparing Different Models

6️⃣ Feature Importance Analysis

7️⃣ Hyperparameter Tuning with GridSearchCV

8️⃣ Code Optimization with Classes & Functions

📌 Key Results

🚀 Future Enhancements

📂 Installation & Usage

🔹 Install dependencies

🔹 Run the script

Dataset Information

🤝 Contributing

⭐ If you find this project useful, consider giving it a star on GitHub!

About

Uh oh!

Releases

Packages

Uh oh!

Languages

mohathecreator/Breast-Cancer-Predicter

Folders and files

Latest commit

History

Repository files navigation

🩷Breast Cancer Prediction using Machine Learning

📌 Project Overview

🛠️ Steps in the Project

1️⃣ Data Loading & Preparation

2️⃣ Exploratory Data Analysis (EDA)

3️⃣ Splitting Data into Training & Testing Sets

4️⃣ Training a Random Forest Classifier

5️⃣ Comparing Different Models

6️⃣ Feature Importance Analysis

7️⃣ Hyperparameter Tuning with GridSearchCV

8️⃣ Code Optimization with Classes & Functions

📌 Key Results

🚀 Future Enhancements

📂 Installation & Usage

🔹 Install dependencies

🔹 Run the script

Dataset Information

🤝 Contributing

⭐ If you find this project useful, consider giving it a star on GitHub!

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages