This repository contains a Jupyter Notebook, Math Assessment.ipynb, that provides a comprehensive psychometric analysis of a mathematics diagnostic assessment for incoming undergraduate students. The project uses Item Response Theory (IRT) models to evaluate item characteristics and student abilities, leveraging the py-irt library and the Pyro probabilistic programming framework.
This IRT analysis also serves as a foundational step for investigating Differential Item Functioning (DIF). The ability estimates (
Comprehensive IRT Modeling: Defines, trains, and evaluates a suite of IRT models, progressing from the baseline Rasch (1PL) model to the more complex 2PL model with multiple covariates.
Covariate Analysis: Incorporates item domain (e.g., geometry, statistics) and pre-determined difficulty levels ("EASY," "MEDIUM," "HARD") as covariates to build more robust and interpretable models.
Bayesian Inference: Models are trained using Stochastic Variational Inference (SVI), a powerful method for approximating posterior distributions in complex probabilistic models.
Model Evaluation: Model fit is assessed visually using Posterior Predictive Checks (PPC), which compare the distribution of observed test scores to scores generated from the fitted models.
Detailed Parameter Extraction: After training, the script extracts and analyzes key item parameters, including discrimination (
Rich Visualization: The notebook generates both theoretical Item Characteristic Curves (ICCs) and empirical ICCs that overlay the model's predictions against actual student response data.
The analysis uses two primary data files: math.items_AnSamp1.csv: Contains the wide-format student response data. mapping_unique_math.items_umgc1ua2.csv: Contains item metadata, including the domain and pre-assigned difficulty level for each item.
The final processed dataset used for modeling consists of responses from 4,460 individuals to 174 distinct items across 6 mathematical domains.
The analysis begins by reshaping the raw data into a long format suitable for IRT modeling. A series of progressively complex IRT models are then specified and trained:
Rasch Model: A 1PL model estimating a single difficulty parameter (b) for each item.
2PL Model: An extension that adds a discrimination parameter (a) for each item.
Rasch + Covariates: Incorporates the effects of item domain and pre-assigned difficulty level on item difficulty.
2PL + Covariates: The most comprehensive model, accounting for item discrimination, base difficulty, and the effects of both domain and pre-assigned difficulty.
A notable finding from the model comparison was that adding the pre-determined item difficulty ("EASY," "MEDIUM," "HARD") as a covariate worsened the model fit. This suggests that the empirically derived difficulty parameters from the IRT models were more effective at explaining student response patterns than the pre-assigned labels.
The primary outputs of this project are:
Item Parameter Estimates: A final CSV file, 2pl_domain_difficulty_item_parameters.csv, containing the estimated parameters for all 174 items.
ICC_Plots_math Folder: Contains the theoretical Item Characteristic Curve (ICC) for each item. These curves show the relationship between a student's ability (
Empirical_Plots_math Folder: Contains empirical plots that show the actual student responses (0 for incorrect, 1 for correct) against their estimated abilities, with the model-predicted ICC overlaid for visual fit assessment.
Installation & Usage: To run the analysis, first install the required dependencies:
pip install --quiet py-irt torch pyro-ppl pandas jsonlines Then, ensure the input CSV files are accessible in your environment (e.g., in your Google Drive) and run the cells sequentially in the Math Assessment.ipynb notebook.