Skip to content

The project developed by team Ontohin 4b for NASA Space Apps Challenge 2025. An ML model with proper web interface which can predict exoplanets with given transit data .

License

Notifications You must be signed in to change notification settings

ByteBard58/Exoplanet_Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

48 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

The Exoplanet Classifier πŸͺ

Welcome to my enhanced fork of The Exoplanet Classifier. Originally developed with my teammates from Ontohin 4b for the NASA Space Apps Challenge 2025, this project now represents the upgraded and research-extended version of that submission. To check out the original repository, click here.

The original repository remains archived under Ontohin 4b and licensed as such. This fork exists purely for further research, experimentation, and personal development to make the classifier far more powerful and accurate than the hackathon version.

A robust, data-driven Machine Learning tool that classifies whether a given set of transit data corresponds to a confirmed exoplanet, false positive, or candidate.
This version blends the strengths of ensemble learning with extensive preprocessing, imputation, and class balancing, resulting in a more stable and generalizable model.


Table of Contents


Project Overview

NASA’s exoplanet survey missions (Kepler, K2, and others) have generated thousands of data points using the transit method β€” tracking dips in starlight caused by orbiting planets.
These datasets contain both confirmed exoplanets and false positives, and the aim of this project is to build an AI classifier capable of making preliminary predictions on new candidates.

The classifier runs inside a Flask-powered web interface, allowing anyone β€” from students to researchers β€” to enter transit parameters and instantly receive a prediction.

The goal is to provide a scientifically meaningful, intuitive, and educational experience for users interested in exoplanet research.


Screenshots

Landing Page Landing Page Input Fields Input Fields Input Fields when filled Input Fields when filled Output Output About Page About Page


Tech Stack

  • Python 3.11 or above – Core programming language
  • Pandas, NumPy – Data processing and numerical computation
  • Scikit-learn – Pipeline, scaling, imputation, model stacking, metrics
  • XGBoost – Gradient boosting-based sub-model for ensemble
  • Imbalanced-learn (SMOTE) – Class balancing for improved fairness
  • Flask – Backend web framework
  • HTML/CSS/JavaScript – Frontend for the interactive web UI
  • Jupyter Notebook – Used as a sandbox (research.ipynb) to experiment with different model architectures, hyperparameters, and feature engineering before finalizing fit.py.

Getting Started

  1. Clone the repository
git  clone  https://github.com/ByteBard58/Exoplanet_Classifier
cd  "Exoplanet_Classifier"
  1. Install dependencies
pip  install  -r  requirements.txt
  1. Run the Flask app
python  app.py
  1. Open your browser and go to http://127.0.0.1:5000 to access the web interface.

  2. If you want to close the server, press Ctrl + C in the terminal where you have run app.py from.


🐳 Run the app directly via Dockerhub Image

I have used Docker to containerize the Exoplanet Classifier in this new fork. The Dockerhub repository allows anyone with any operating system or other system configuration to easily run the app.

The image is built on both ARM64 and AMD64 architectures, so that it can run on almost all major computers and servers. You can run the app easily by using the Dockerhub Image. Here's how you can do it:

  1. Install Docker Desktop and sign-in. Make sure the app is functioning properly.

  2. Open Terminal and run:

docker pull bytebard101/exoplanet_classifier
docker run --rm -p 5000:5000 bytebard101/exoplanet_classifier:latest
  1. If your machine faces a port conflict, you will need to assign another port. Try to run this:
docker run --rm -p 5001:5000 bytebard101/exoplanet_classifier:latest

If you followed Step 2 and the command ran successfully, then DO NOT follow this step.

  1. The app will be live at localhost:5000. Open your browser and navigate to http://127.0.0.1:5000 (or http://127.0.0.1:5001 if you followed Step 3).

Check Docker Documentation to learn more about Docker and it's commands.


How to Use

  • Press Get Started on the webpage.

  • Enter the candidate features in the input fields (values like Orbital Period, Transit Epoch, Transit Depth, etc.).

  • Click Predict to run the prediction.

  • For more detailed information about each input and other subjects, press LEARN MORE (located at the top).


Project Structure

EXOPLANET_CLASSIFIER/
β”œβ”€β”€ .github/              # Folder for GitHub actions
β”‚
β”œβ”€β”€ data/             
β”‚   β”œβ”€β”€ k2_data.csv
β”‚   β”œβ”€β”€ kepler_data.csv
β”‚   └── source.txt
β”‚
β”œβ”€β”€ models/              
β”‚   β”œβ”€β”€ column_names.pkl    # Not included in the repo
β”‚   β”œβ”€β”€ info.txt
β”‚   └── pipe.pkl            # Not included in the repo
β”‚
β”œβ”€β”€ screenshots/           
β”‚
β”œβ”€β”€ static/
β”‚   β”œβ”€β”€ materials/
β”‚   └── script.js
β”‚
β”œβ”€β”€ templates/
β”‚   β”œβ”€β”€ about.html
β”‚   └── index.html
β”‚
β”œβ”€β”€ .gitignore
β”œβ”€β”€ app.py
β”œβ”€β”€ fit.py
β”œβ”€β”€ LICENSE
β”œβ”€β”€ README.md          # You're reading it now
β”œβ”€β”€ requirements.txt
└── research.ipynb

Model Details

Architecture

The upgraded classifier uses a stacking ensemble combining multiple base models with a meta-classifier:

  • Base Models:
    • RandomForestClassifier
      • n_estimators=1000
      • max_depth=None
      • class_weight="balanced"
    • XGBClassifier
      • n_estimators=1000
      • max_depth=None
      • learning_rate=0.5
  • Meta-classifier:
    • LogisticRegression
      • solver="saga"
      • penalty="l2"
      • C=0.1
      • class_weight="balanced"
      • max_iter=5000

The stacking classifier uses 5-fold cross-validation internally and passes original features to the meta-classifier for better learning.


Preprocessing Pipeline

Before feeding data into the model, the following preprocessing steps are applied via a Pipeline:

  1. Imputation: SimpleImputer(strategy="mean") to handle missing values.
  2. Scaling: StandardScaler to normalize features.
  3. Class Balancing: SMOTE (Synthetic Minority Oversampling Technique) to address class imbalance.
  4. Model Training: Stacking ensemble as described above.

Features

The model uses 13 transit and orbital-related features, including:

  • Orbital period, transit epoch, transit depth
  • Planetary radius, semi-major axis, inclination
  • Equilibrium temperature, insolation, impact parameter
  • Radius ratios, density ratios, duration ratios
  • Number of observed transits

Labels

Targets are mapped as follows:

  • 0 β†’ FALSE POSITIVE or REFUTED
  • 1 β†’ CANDIDATE
  • 2 β†’ CONFIRMED

Data Sources


Training Workflow

  • Train/test split: 2/3 training, 1/3 testing with stratification on class labels.
  • Pipeline is trained end-to-end in fit.py.
  • Hyperparameters and model choices were extensively tested in research.ipynb, which served as a sandbox for experimentation and optimization.
  • Final trained pipeline is saved as models/pipe.pkl and column order as models/column_names.pkl.

Evaluation

Here is the classification report:

Class Precision Recall F1-score Support
0 (FALSE POSITIVE / REFUTED) 0.82 0.81 0.82 1718
1 (CANDIDATE) 0.56 0.55 0.56 1118
2 (CONFIRMED) 0.79 0.81 0.80 1687

Overall Metrics:

  • Accuracy: 0.75
  • Macro Avg: Precision = 0.72, Recall = 0.72, F1-score = 0.72
  • Weighted Avg: Precision = 0.74, Recall = 0.75, F1-score = 0.75

This demonstrates that the upgraded stacking classifier maintains strong performance on confirmed and false positive classes, with room for improvement on candidate predictions.
The model balances accuracy, generalization, and class fairness, making it reliable for preliminary exoplanet classification tasks.


Note

Despite extensive experimentation, this represents the current performance ceiling achievable with the available data.
Numerous optimizations were explored β€” including hyperparameter tuning, feature scaling, class rebalancing, and ensemble variations β€” yet further improvements beyond ~0.75 accuracy were not observed.
This indicates a data limitation rather than a model limitation, as the features may not carry additional separable information for higher classification accuracy.

The research process behind this version involved significant model testing and fine-tuning efforts (see research.ipynb).

Suggestions and improvements are highly welcome β€” contributions or insights from the community could help push the model beyond its present boundary.


Acknowledgements

  • NASA Kepler and K2 Mission for providing the training datasets

  • Scikit-learn, XGBoost, and Imbalanced-learn teams for exceptional libraries

  • Inspiration from data science projects exploring real-world astrophysics datasets

  • The scientists who are engaged with exoplanet research. Their problem inspired us to create this project from the ground up

  • Ontohin 4b team for the original NASA SAC 2025 version of this project


Appreciation

Thank you for checking out this upgraded version of The Exoplanet Classifier. This repository is a personal continuation of a NASA Space Apps Challenge project β€” rebuilt with the intent to learn, improve, and explore the depths of real-world astrophysics through Machine Learning.

Have a great day !

About

The project developed by team Ontohin 4b for NASA Space Apps Challenge 2025. An ML model with proper web interface which can predict exoplanets with given transit data .

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •