This project was done as part of SMU CS105 - Statistical Thinking for Data Science syllabus.
To view the project live, please click on this link here.
Members
Our project aims to identify and understand the factors that drive housing prices (our dependent variable) and be able to develop a sound method to predict the fair value of a house. We first start off with data collection from a given dataset which contains information about housing price factors including Crime Rate, Tax Rate and Number of Rooms. We then continued our project by doing some data pre-processing and data cleaning to deal with data outliers in order to ensure data integrity.
Following which, we used exploratory data analysis (EDA) techniques to find out any possible correlations between the factors and the housing price. We did so by doing a univariate analysis for each housing price factor and bivariate analysis on factors which we believe to be of interest. We then trained our model and did feature selection on the factors to only use factors that were highly correlated to the housing prices.
Finally, we tested these factors by building a Linear Regression model and a Ridge Regression model and testing our model against the sample data provided.