Welcome to my enhanced fork of The Exoplanet Classifier. Originally developed with my teammates from Ontohin 4b for the NASA Space Apps Challenge 2025, this project now represents the upgraded and research-extended version of that submission. To check out the original repository, click here.
The original repository remains archived under Ontohin 4b and licensed as such. This fork exists purely for further research, experimentation, and personal development to make the classifier far more powerful and accurate than the hackathon version.
A robust, data-driven Machine Learning tool that classifies whether a given set of transit data corresponds to a confirmed exoplanet, false positive, or candidate.
This version blends the strengths of ensemble learning with extensive preprocessing, imputation, and class balancing, resulting in a more stable and generalizable model.
NASAβs exoplanet survey missions (Kepler, K2, and others) have generated thousands of data points using the transit method β tracking dips in starlight caused by orbiting planets.
These datasets contain both confirmed exoplanets and false positives, and the aim of this project is to build an AI classifier capable of making preliminary predictions on new candidates.
The classifier runs inside a Flask-powered web interface, allowing anyone β from students to researchers β to enter transit parameters and instantly receive a prediction.
The goal is to provide a scientifically meaningful, intuitive, and educational experience for users interested in exoplanet research.
Landing Page
Input Fields
Input Fields when filled
Output
About Page

- Python 3.11 or above β Core programming language
- Pandas, NumPy β Data processing and numerical computation
- Scikit-learn β Pipeline, scaling, imputation, model stacking, metrics
- XGBoost β Gradient boosting-based sub-model for ensemble
- Imbalanced-learn (SMOTE) β Class balancing for improved fairness
- Flask β Backend web framework
- HTML/CSS/JavaScript β Frontend for the interactive web UI
- Jupyter Notebook β Used as a sandbox (
research.ipynb) to experiment with different model architectures, hyperparameters, and feature engineering before finalizingfit.py.
- Clone the repository
git clone https://github.com/ByteBard58/Exoplanet_Classifier
cd "Exoplanet_Classifier"- Install dependencies
pip install -r requirements.txt- Run the Flask app
python app.py-
Open your browser and go to
http://127.0.0.1:5000to access the web interface. -
If you want to close the server, press
Ctrl + Cin the terminal where you have runapp.pyfrom.
I have used Docker to containerize the Exoplanet Classifier in this new fork. The Dockerhub repository allows anyone with any operating system or other system configuration to easily run the app.
The image is built on both ARM64 and AMD64 architectures, so that it can run on almost all major computers and servers. You can run the app easily by using the Dockerhub Image. Here's how you can do it:
-
Install Docker Desktop and sign-in. Make sure the app is functioning properly.
-
Open Terminal and run:
docker pull bytebard101/exoplanet_classifier
docker run --rm -p 5000:5000 bytebard101/exoplanet_classifier:latest- If your machine faces a port conflict, you will need to assign another port. Try to run this:
docker run --rm -p 5001:5000 bytebard101/exoplanet_classifier:latestIf you followed Step 2 and the command ran successfully, then DO NOT follow this step.
- The app will be live at localhost:5000. Open your browser and navigate to http://127.0.0.1:5000 (or http://127.0.0.1:5001 if you followed Step 3).
Check Docker Documentation to learn more about Docker and it's commands.
-
Press Get Started on the webpage.
-
Enter the candidate features in the input fields (values like Orbital Period, Transit Epoch, Transit Depth, etc.).
-
Click Predict to run the prediction.
-
For more detailed information about each input and other subjects, press LEARN MORE (located at the top).
EXOPLANET_CLASSIFIER/
βββ .github/ # Folder for GitHub actions
β
βββ data/
β βββ k2_data.csv
β βββ kepler_data.csv
β βββ source.txt
β
βββ models/
β βββ column_names.pkl # Not included in the repo
β βββ info.txt
β βββ pipe.pkl # Not included in the repo
β
βββ screenshots/
β
βββ static/
β βββ materials/
β βββ script.js
β
βββ templates/
β βββ about.html
β βββ index.html
β
βββ .gitignore
βββ app.py
βββ fit.py
βββ LICENSE
βββ README.md # You're reading it now
βββ requirements.txt
βββ research.ipynb
The upgraded classifier uses a stacking ensemble combining multiple base models with a meta-classifier:
- Base Models:
RandomForestClassifiern_estimators=1000max_depth=Noneclass_weight="balanced"
XGBClassifiern_estimators=1000max_depth=Nonelearning_rate=0.5
- Meta-classifier:
LogisticRegressionsolver="saga"penalty="l2"C=0.1class_weight="balanced"max_iter=5000
The stacking classifier uses 5-fold cross-validation internally and passes original features to the meta-classifier for better learning.
Before feeding data into the model, the following preprocessing steps are applied via a Pipeline:
- Imputation:
SimpleImputer(strategy="mean")to handle missing values. - Scaling:
StandardScalerto normalize features. - Class Balancing:
SMOTE(Synthetic Minority Oversampling Technique) to address class imbalance. - Model Training: Stacking ensemble as described above.
The model uses 13 transit and orbital-related features, including:
- Orbital period, transit epoch, transit depth
- Planetary radius, semi-major axis, inclination
- Equilibrium temperature, insolation, impact parameter
- Radius ratios, density ratios, duration ratios
- Number of observed transits
Targets are mapped as follows:
0β FALSE POSITIVE or REFUTED1β CANDIDATE2β CONFIRMED
- Train/test split: 2/3 training, 1/3 testing with stratification on class labels.
- Pipeline is trained end-to-end in
fit.py. - Hyperparameters and model choices were extensively tested in research.ipynb, which served as a sandbox for experimentation and optimization.
- Final trained pipeline is saved as
models/pipe.pkland column order asmodels/column_names.pkl.
Here is the classification report:
| Class | Precision | Recall | F1-score | Support |
|---|---|---|---|---|
| 0 (FALSE POSITIVE / REFUTED) | 0.82 | 0.81 | 0.82 | 1718 |
| 1 (CANDIDATE) | 0.56 | 0.55 | 0.56 | 1118 |
| 2 (CONFIRMED) | 0.79 | 0.81 | 0.80 | 1687 |
Overall Metrics:
- Accuracy: 0.75
- Macro Avg: Precision = 0.72, Recall = 0.72, F1-score = 0.72
- Weighted Avg: Precision = 0.74, Recall = 0.75, F1-score = 0.75
This demonstrates that the upgraded stacking classifier maintains strong performance on confirmed and false positive classes, with room for improvement on candidate predictions.
The model balances accuracy, generalization, and class fairness, making it reliable for preliminary exoplanet classification tasks.
Despite extensive experimentation, this represents the current performance ceiling achievable with the available data.
Numerous optimizations were explored β including hyperparameter tuning, feature scaling, class rebalancing, and ensemble variations β yet further improvements beyond ~0.75 accuracy were not observed.
This indicates a data limitation rather than a model limitation, as the features may not carry additional separable information for higher classification accuracy.
The research process behind this version involved significant model testing and fine-tuning efforts (see research.ipynb).
Suggestions and improvements are highly welcome β contributions or insights from the community could help push the model beyond its present boundary.
-
NASA Kepler and K2 Mission for providing the training datasets
-
Scikit-learn, XGBoost, and Imbalanced-learn teams for exceptional libraries
-
Inspiration from data science projects exploring real-world astrophysics datasets
-
The scientists who are engaged with exoplanet research. Their problem inspired us to create this project from the ground up
-
Ontohin 4b team for the original NASA SAC 2025 version of this project
Thank you for checking out this upgraded version of The Exoplanet Classifier. This repository is a personal continuation of a NASA Space Apps Challenge project β rebuilt with the intent to learn, improve, and explore the depths of real-world astrophysics through Machine Learning.
Have a great day !