An end-to-end Machine Learning web application that predicts California house prices using the California Housing Dataset, powered by XGBoost and deployed with a Flask backend.
Deployment link coming soon
- Overview
- Features
- Tech Stack
- Dataset
- Model Performance
- Project Structure
- Getting Started
- How It Works
- Planned Enhancements
- Contributing
This project tackles a classic regression problem โ predicting median house prices across California districts. Given block-level demographic and geographic features, the model estimates a property's value using the XGBoost Regressor, one of the most powerful gradient-boosting frameworks available.
The project includes a polished Flask web app where users can input housing features and instantly receive a price estimate, along with a built-in Loan Calculator that auto-computes EMI, down payment, and loan amount based on the predicted price.
- ๐ค XGBoost Regression Model trained on the California Housing Dataset
- ๐ Flask Web App with a clean, responsive UI
- ๐ Dark / Light Mode Toggle for better accessibility
- ๐ฐ Integrated Loan Calculator โ auto-populates with the predicted price and allows users to adjust interest rate, down payment percentage, and tenure
- ๐ Form Persistence โ input values are retained after prediction for easy comparison
- ๐ฆ Serialized Model via
joblibfor fast inference
| Layer | Technology |
|---|---|
| Language | Python 3.x |
| ML Framework | XGBoost, scikit-learn |
| Web Framework | Flask |
| Data Handling | Pandas, NumPy |
| Model Serialization | Joblib |
| Frontend | HTML5, CSS3, JavaScript |
| Fonts | Google Fonts (Poppins) |
Source: California Housing Dataset (derived from the 1990 U.S. Census)
| Feature | Description |
|---|---|
MedInc |
Median income of households in the block (in tens of thousands of USD) |
HouseAge |
Median age of houses in the block (years) |
AveRooms |
Average number of rooms per household |
AveBedrms |
Average number of bedrooms per household |
Population |
Total population of the block |
AveOccup |
Average number of occupants per household |
Latitude |
Latitude coordinate of the block |
Longitude |
Longitude coordinate of the block |
MedHouseVal โญ |
Target โ Median house value (in hundreds of thousands of USD) |
Note: The target variable is expressed in units of $100,000. The web app automatically converts predictions to full USD for display.
| Metric | Train | Test |
|---|---|---|
| Rยฒ Score | ~0.94 | ~0.83 |
| MAE | ~0.10 | ~0.19 |
Exact values will vary based on environment and XGBoost version. Run
train_model.pyto reproduce.
The model achieves strong generalization on unseen data. The gap between train and test Rยฒ is a known characteristic of XGBoost on this dataset and will be addressed through hyperparameter tuning in upcoming iterations.
House Price Predictor/
โ
โโโ dataset/
โ โโโ house_prices.csv # California Housing Dataset
โ
โโโ templates/
โ โโโ index.html # Frontend โ prediction form + loan calculator
โ
โโโ static/
โ โโโ style.css # Styling with dark/light mode support
โ
โโโ train_model.py # Data prep, model training & evaluation
โโโ application.py # Flask app โ routes and prediction logic
โโโ house_price_model.pkl # Serialized trained XGBoost model
โโโ requirements.txt # Python dependencies
โโโ README.md
- Python 3.8+
- pip
# 1. Clone the repository
git clone https://github.com/yourusername/house-price-predictor.git
cd house-price-predictor
# 2. Install dependencies
pip install -r requirements.txt
# 3. Train the model (generates house_price_model.pkl)
python train_model.py
# 4. Start the Flask server
python application.pyThen open your browser and navigate to http://127.0.0.1:5000.
flask
numpy
pandas
scikit-learn
xgboost
joblib
User Input (8 features)
โ
Flask POST /predict
โ
Load house_price_model.pkl
โ
XGBRegressor.predict()
โ
Convert output (ร$100,000)
โ
Display Estimated Price + Loan Calculator
- The user fills in 8 housing features on the web form.
- On submission, Flask collects and preprocesses the input.
- The pre-trained XGBoost model predicts the median house value.
- The result is scaled to full USD and displayed.
- The integrated loan calculator auto-fills with the predicted price for instant financial planning.
This project is actively being developed. Here's what's coming next:
- Algorithm Benchmarking โ Compare XGBoost against Random Forest, Linear Regression, Ridge, Lasso, SVR, and LightGBM to empirically identify the best model
- Hyperparameter Tuning โ Use GridSearchCV / RandomizedSearchCV / Optuna to optimize XGBoost parameters (
n_estimators,max_depth,learning_rate, etc.) - Cross-Validation โ Replace single train/test split with K-Fold cross-validation for more robust evaluation
- Feature Engineering โ Derive new features such as
rooms_per_person,bedrooms_per_room, andincome_per_room - Feature Importance Visualization โ Plot SHAP values or XGBoost feature importances to explain model decisions
- Outlier Detection & Removal โ Analyze the effect of removing outliers (e.g., houses capped at $500,001) on model accuracy
- Geospatial Analysis โ Map predictions by latitude/longitude to surface geographic pricing patterns
- Interactive Price Map โ Visualize predicted prices on a California map using Folium or Leaflet.js
- Prediction History โ Let users compare multiple predictions side-by-side in a session
- Confidence Intervals โ Display a price range instead of a single point estimate
- Input Validation & Tooltips โ Add contextual guidance for each input field
- Cloud Deployment โ Host on AWS Elastic Beanstalk, Render, or Heroku
- Dockerize โ Package the application in a Docker container for consistent deployment
- REST API โ Expose a
/api/predictendpoint for programmatic access
Contributions, issues, and feature requests are welcome!
- Fork the repository
- Create your feature branch (
git checkout -b feature/my-feature) - Commit your changes (
git commit -m 'Add my feature') - Push to the branch (
git push origin feature/my-feature) - Open a Pull Request
This project is licensed under the MIT License.
Harsvardhan Rajgarhia
CSBS'27, Academy Of Technology
โญ If you found this project useful, consider giving it a star on GitHub!