Salary Prediction: Regression Analysis

Project Overview

This project aims to predict the salary of employees based on their years of experience using supervised machine learning techniques. Two regression models, Linear Regression and Decision Tree Regression, are implemented and compared.

Dataset Information

Dataset Name: salary_data.csv
Columns:
- employee_id: Unique identifier for each employee.
- experience_years: Years of experience of the employee.
- salary: Salary of the employee.

Workflow

1. Data Ingestion

Dataset is read and loaded using pandas.
Initial exploration is done with:
- data.head()
- data.info()
- data.describe()

2. Exploratory Data Analysis (EDA)

A scatter plot visualizes the relationship between experience_years and salary using seaborn and matplotlib.

3. Data Preparation

Checked for duplicates and removed them.
Checked for missing values (none found).
Dataset was split into predictors (X) and target (y).
Data was further split into training and testing sets:
- Train/Test Ratio: 75/25

4. Model Implementation

a. Linear Regression

Fitted a linear regression model using sklearn.linear_model.LinearRegression.
Plotted actual vs predicted values.
Evaluated using:
- Mean Squared Error (MSE)
- R-squared (R²) score

b. Decision Tree Regression

Fitted a decision tree regressor using sklearn.tree.DecisionTreeRegressor.
Plotted actual vs predicted values.
Evaluated using:
- Mean Squared Error (MSE)
- R-squared (R²) score

Results

Linear Regression

Model Equation: y = 1641.366 + 103.197 * x
Evaluation Metrics:
- Train MSE: 107699.85
- Test MSE: 128111.12
- Train R²: 0.77
- Test R²: 0.63

Decision Tree Regression

Evaluation Metrics:
- Train MSE: 88.12
- Test MSE: 128311.56
- Train R²: 1.00
- Test R²: 0.61

Dependencies

Python 3.8+
Libraries:
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn

Usage

Install required libraries:

pip install pandas numpy matplotlib seaborn scikit-learn

Run the script in your Python environment.
Ensure the salary_data.csv file is in the working directory.

Key Insights

There is a strong positive correlation between years of experience and salary.
Linear regression provides a simpler model but is less precise than the decision tree on this dataset.
Decision trees overfit the training data but show similar performance to linear regression on the test data.

Future Improvements

Use additional features to improve model performance.
Experiment with more advanced regression models like Random Forest or Gradient Boosting.
Perform hyperparameter tuning for Decision Tree to reduce overfitting.

Contact

For any queries or contributions, please reach out at: bimadev06@gmail.com.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
Supervised_Regression_SalaryPrediction.ipynb		Supervised_Regression_SalaryPrediction.ipynb
download.png		download.png
salary_data.csv		salary_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Salary Prediction: Regression Analysis

Project Overview

Dataset Information

Workflow

1. Data Ingestion

2. Exploratory Data Analysis (EDA)

3. Data Preparation

4. Model Implementation

a. Linear Regression

b. Decision Tree Regression

Results

Linear Regression

Decision Tree Regression

Dependencies

Usage

Key Insights

Future Improvements

Contact

About

Uh oh!

Releases

Packages

Languages

License

bimadevs/Supervised_Regression_SalaryPrediction

Folders and files

Latest commit

History

Repository files navigation

Salary Prediction: Regression Analysis

Project Overview

Dataset Information

Workflow

1. Data Ingestion

2. Exploratory Data Analysis (EDA)

3. Data Preparation

4. Model Implementation

a. Linear Regression

b. Decision Tree Regression

Results

Linear Regression

Decision Tree Regression

Dependencies

Usage

Key Insights

Future Improvements

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages