Skip to content

Harsh1574/house_price_predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

6 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ  California House Price Predictor

An end-to-end Machine Learning web application that predicts California house prices using the California Housing Dataset, powered by XGBoost and deployed with a Flask backend.


๐Ÿš€ Live Demo

Deployment link coming soon


๐Ÿ“Œ Table of Contents


๐Ÿง  Overview

This project tackles a classic regression problem โ€” predicting median house prices across California districts. Given block-level demographic and geographic features, the model estimates a property's value using the XGBoost Regressor, one of the most powerful gradient-boosting frameworks available.

The project includes a polished Flask web app where users can input housing features and instantly receive a price estimate, along with a built-in Loan Calculator that auto-computes EMI, down payment, and loan amount based on the predicted price.


โœจ Features

  • ๐Ÿค– XGBoost Regression Model trained on the California Housing Dataset
  • ๐ŸŒ Flask Web App with a clean, responsive UI
  • ๐ŸŒ™ Dark / Light Mode Toggle for better accessibility
  • ๐Ÿ’ฐ Integrated Loan Calculator โ€” auto-populates with the predicted price and allows users to adjust interest rate, down payment percentage, and tenure
  • ๐Ÿ” Form Persistence โ€” input values are retained after prediction for easy comparison
  • ๐Ÿ“ฆ Serialized Model via joblib for fast inference

๐Ÿ› ๏ธ Tech Stack

Layer Technology
Language Python 3.x
ML Framework XGBoost, scikit-learn
Web Framework Flask
Data Handling Pandas, NumPy
Model Serialization Joblib
Frontend HTML5, CSS3, JavaScript
Fonts Google Fonts (Poppins)

๐Ÿ“Š Dataset

Source: California Housing Dataset (derived from the 1990 U.S. Census)

Feature Description
MedInc Median income of households in the block (in tens of thousands of USD)
HouseAge Median age of houses in the block (years)
AveRooms Average number of rooms per household
AveBedrms Average number of bedrooms per household
Population Total population of the block
AveOccup Average number of occupants per household
Latitude Latitude coordinate of the block
Longitude Longitude coordinate of the block
MedHouseVal โญ Target โ€” Median house value (in hundreds of thousands of USD)

Note: The target variable is expressed in units of $100,000. The web app automatically converts predictions to full USD for display.


๐Ÿ“ˆ Model Performance

Metric Train Test
Rยฒ Score ~0.94 ~0.83
MAE ~0.10 ~0.19

Exact values will vary based on environment and XGBoost version. Run train_model.py to reproduce.

The model achieves strong generalization on unseen data. The gap between train and test Rยฒ is a known characteristic of XGBoost on this dataset and will be addressed through hyperparameter tuning in upcoming iterations.


๐Ÿ“ Project Structure

House Price Predictor/
โ”‚
โ”œโ”€โ”€ dataset/
โ”‚   โ””โ”€โ”€ house_prices.csv          # California Housing Dataset
โ”‚
โ”œโ”€โ”€ templates/
โ”‚   โ””โ”€โ”€ index.html                # Frontend โ€” prediction form + loan calculator
โ”‚
โ”œโ”€โ”€ static/
โ”‚   โ””โ”€โ”€ style.css                 # Styling with dark/light mode support
โ”‚
โ”œโ”€โ”€ train_model.py                # Data prep, model training & evaluation
โ”œโ”€โ”€ application.py                # Flask app โ€” routes and prediction logic
โ”œโ”€โ”€ house_price_model.pkl         # Serialized trained XGBoost model
โ”œโ”€โ”€ requirements.txt              # Python dependencies
โ””โ”€โ”€ README.md

โš™๏ธ Getting Started

Prerequisites

  • Python 3.8+
  • pip

Installation

# 1. Clone the repository
git clone https://github.com/yourusername/house-price-predictor.git
cd house-price-predictor

# 2. Install dependencies
pip install -r requirements.txt

# 3. Train the model (generates house_price_model.pkl)
python train_model.py

# 4. Start the Flask server
python application.py

Then open your browser and navigate to http://127.0.0.1:5000.

Requirements

flask
numpy
pandas
scikit-learn
xgboost
joblib

โš™๏ธ How It Works

User Input (8 features)
        โ†“
Flask POST /predict
        โ†“
Load house_price_model.pkl
        โ†“
XGBRegressor.predict()
        โ†“
Convert output (ร—$100,000)
        โ†“
Display Estimated Price + Loan Calculator
  1. The user fills in 8 housing features on the web form.
  2. On submission, Flask collects and preprocesses the input.
  3. The pre-trained XGBoost model predicts the median house value.
  4. The result is scaled to full USD and displayed.
  5. The integrated loan calculator auto-fills with the predicted price for instant financial planning.

๐Ÿ”ฎ Planned Enhancements

This project is actively being developed. Here's what's coming next:

๐Ÿ”ฌ Model & ML

  • Algorithm Benchmarking โ€” Compare XGBoost against Random Forest, Linear Regression, Ridge, Lasso, SVR, and LightGBM to empirically identify the best model
  • Hyperparameter Tuning โ€” Use GridSearchCV / RandomizedSearchCV / Optuna to optimize XGBoost parameters (n_estimators, max_depth, learning_rate, etc.)
  • Cross-Validation โ€” Replace single train/test split with K-Fold cross-validation for more robust evaluation
  • Feature Engineering โ€” Derive new features such as rooms_per_person, bedrooms_per_room, and income_per_room
  • Feature Importance Visualization โ€” Plot SHAP values or XGBoost feature importances to explain model decisions
  • Outlier Detection & Removal โ€” Analyze the effect of removing outliers (e.g., houses capped at $500,001) on model accuracy
  • Geospatial Analysis โ€” Map predictions by latitude/longitude to surface geographic pricing patterns

๐ŸŒ Web App

  • Interactive Price Map โ€” Visualize predicted prices on a California map using Folium or Leaflet.js
  • Prediction History โ€” Let users compare multiple predictions side-by-side in a session
  • Confidence Intervals โ€” Display a price range instead of a single point estimate
  • Input Validation & Tooltips โ€” Add contextual guidance for each input field

๐Ÿš€ Deployment

  • Cloud Deployment โ€” Host on AWS Elastic Beanstalk, Render, or Heroku
  • Dockerize โ€” Package the application in a Docker container for consistent deployment
  • REST API โ€” Expose a /api/predict endpoint for programmatic access

๐Ÿค Contributing

Contributions, issues, and feature requests are welcome!

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/my-feature)
  3. Commit your changes (git commit -m 'Add my feature')
  4. Push to the branch (git push origin feature/my-feature)
  5. Open a Pull Request

๐Ÿ“„ License

This project is licensed under the MIT License.


๐Ÿ‘ค Author

Harsvardhan Rajgarhia

CSBS'27, Academy Of Technology

Gmail LinkedIn GitHub

โญ If you found this project useful, consider giving it a star on GitHub!

About

๐ŸกEnd-to-end ML web app that predicts California house prices using XGBoost & Flask โ€” features an integrated Loan Calculator with EMI breakdown and dark/light mode UI.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors