This project builds a custom Linear Regression model from scratch (no scikit-learn regressors!) and uses it to predict a student's subject scores based on their career aspiration.
Itβs not just a basic regression β the model also gives:
- β Predicted scores for all subjects (Math, Physics, Chemistry, etc.)
- π§ A confidence rating showing how sure the model is
- π Suggested weekly study hours & average effort level for that career
- πΎ Exportable trained model for reuse
This system uses a multi-output linear regression approach implemented from scratch (supporting both Normal Equation and Gradient Descent).
The input feature is the student's career aspiration, which is one-hot encoded and used to predict multiple subject scores simultaneously.
Once trained, the model can:
- Suggest how a student might perform in each subject if they pursue a certain career path.
- Estimate how much self-study time students in that career category typically put in.
- Express confidence in its predictions (based on error variance).
| Feature | Description |
|---|---|
| Custom Linear Regression | Implemented from scratch using NumPy, supporting both normal equation and gradient descent. |
| Multi-output Support | Predicts multiple subject scores at once (Math, Physics, Chemistry, etc.) |
| Confidence Scoring | Estimates how reliable each prediction is based on similar data points. |
| Career Suggestion List | Displays all available career aspiration options to the user for easy selection. |
| Study Hours Prediction | Predicts average weekly self-study time and effort required for that career. |
| Model Persistence | Model can be saved and reloaded using Joblib or manual NumPy serialization. |
Input:
Career Aspiration β Encoded using OneHotEncoder (categorical β numeric features)
Output:
Scores for each subject (Math, Physics, Chemistry, Biology, English, Geographyβ¦)
Model:
Custom Linear Regression
[
Y = XW + b
]
Where:
- ( X ): encoded career vector
- ( W ): learned weights
- ( Y ): predicted subject scores
Training:
Either via Normal Equation (closed-form) or Gradient Descent.
Confidence =
[
1 - \frac{\text{Mean Absolute Error for that career}}{\text{Max Deviation for that career}}
]
This scales between 0 and 1, where:
- 1.0 β Model is very sure (low error)
- 0.5 β Moderate reliability
- 0.0 β Weak confidence (limited similar samples)
Selected Career: Data Scientist
Predicted Subject Scores:
Math: 89.3 Physics: 82.7 Chemistry: 78.9 English: 91.1 Biology: 68.2 Geography: 74.5
Model Confidence: 0.86 Avg Weekly Study Hours: 8.2 hrs/week Suggested Effort: High