Skip to content

Regression analysis of the ISLR2 College dataset using R. Includes data exploration, model fitting, and diagnostics.

License

Notifications You must be signed in to change notification settings

ChokZB/college-regression-analysis

Repository files navigation

Regression Modelling with the ISLR2 College Dataset in R

This repository presents a regression analysis performed on the College dataset from the ISLR2 package in R.

The analysis explores relationships between various institutional characteristics and student outcomes to demonstrate statistical modelling techniques using R.


🎯 Objectives

  • Explore and visualise the College dataset.
  • Apply multiple linear regression techniques.
  • Evaluate model performance and interpret significant predictors.
  • Demonstrate data cleaning, exploratory analysis, and regression diagnostics.

🗃️ Dataset

The College dataset is included in the ISLR2 R package and contains information on U.S. colleges such as:

  • Number of applications and acceptances
  • Tuition and room costs
  • Graduation rate
  • Student-to-faculty ratio
  • Type of institution (Private or Public)

⚙️ Methods

  1. Data exploration and summary statistics
  2. Linear and multiple regression modelling
  3. Model diagnostics and residual analysis
  4. Variable transformations to improve fit
  5. Best subset selection to identify optimal predictors
  6. Polynomial regression and evaluation using cross-validation (Holdout, LOOCV, k-Fold)

📁 Project Structure

college-regression-analysis/
│
├── figures/                           # Generated plots and diagnostic visualisations
│   ├── model_diagnostics_original.png
│   ├── studentized_residuals_original.png
│   └── ...
│
├── .gitignore                         # Files/folders excluded from Git
│
├── LICENSE                            # MIT License
│
├── README.md                          # Project overview and instructions
│
├── college-regression-analysis.Rproj  # RStudio project file for reproducibility
│
├── college_regression_analysis.R      # Main R script with full analysis
│
├── college_regression_report.pdf      # Written assignment report
│
└── install_packages.R                 # Install all required R packages

🔧 Reproducibility

To reproduce the analysis:

  1. Clone the repository

    git clone https://github.com/ChokZB/college-regression-analysis.git
  2. Open the R project in RStudio

    college-regression-analysis.Rproj
  3. Install all required packages

    source("install_packages.R")
  4. Run the main script

    source("college_regression_analysis.R")

📈 Results Preview

Below is a sample output from the regression and model selection analyses.

Best Subset Selection Metrics


🧑‍💻 Author

Chok Zu Bing

GitHub: @ChokZB


🪪 License

This project is released under the MIT License.

About

Regression analysis of the ISLR2 College dataset using R. Includes data exploration, model fitting, and diagnostics.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages