Skip to content

This repository contains Python code for analyzing and assessing the validity of mathematics diagnostic assessment for incoming undergraduate students.

Notifications You must be signed in to change notification settings

ORosca/MathValidityBayesianIRT_Py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MathValidityPy: IRT Analysis of a Math Diagnostic Assessment

This repository contains a Jupyter Notebook, Math Assessment.ipynb, that provides a comprehensive psychometric analysis of a mathematics diagnostic assessment for incoming undergraduate students. The project uses Item Response Theory (IRT) models to evaluate item characteristics and student abilities, leveraging the py-irt library and the Pyro probabilistic programming framework.

This IRT analysis also serves as a foundational step for investigating Differential Item Functioning (DIF). The ability estimates ($\theta$) generated by these models can be used in subsequent analyses to determine if items function differently across various demographic subgroups.

Key Features

Comprehensive IRT Modeling: Defines, trains, and evaluates a suite of IRT models, progressing from the baseline Rasch (1PL) model to the more complex 2PL model with multiple covariates.

Covariate Analysis: Incorporates item domain (e.g., geometry, statistics) and pre-determined difficulty levels ("EASY," "MEDIUM," "HARD") as covariates to build more robust and interpretable models.

Bayesian Inference: Models are trained using Stochastic Variational Inference (SVI), a powerful method for approximating posterior distributions in complex probabilistic models.

Model Evaluation: Model fit is assessed visually using Posterior Predictive Checks (PPC), which compare the distribution of observed test scores to scores generated from the fitted models.

Detailed Parameter Extraction: After training, the script extracts and analyzes key item parameters, including discrimination ($\alpha$), base difficulty ($\beta_{base}$), domain shifts ($\delta$), and difficulty shifts ($\gamma$).

Rich Visualization: The notebook generates both theoretical Item Characteristic Curves (ICCs) and empirical ICCs that overlay the model's predictions against actual student response data.

Data

The analysis uses two primary data files: math.items_AnSamp1.csv: Contains the wide-format student response data. mapping_unique_math.items_umgc1ua2.csv: Contains item metadata, including the domain and pre-assigned difficulty level for each item.

The final processed dataset used for modeling consists of responses from 4,460 individuals to 174 distinct items across 6 mathematical domains.

Methodology & Models

The analysis begins by reshaping the raw data into a long format suitable for IRT modeling. A series of progressively complex IRT models are then specified and trained:

Baseline Models

Rasch Model: A 1PL model estimating a single difficulty parameter (b) for each item.

2PL Model: An extension that adds a discrimination parameter (a) for each item.

Models with Covariates

Rasch + Covariates: Incorporates the effects of item domain and pre-assigned difficulty level on item difficulty.

2PL + Covariates: The most comprehensive model, accounting for item discrimination, base difficulty, and the effects of both domain and pre-assigned difficulty.

Key Findings

A notable finding from the model comparison was that adding the pre-determined item difficulty ("EASY," "MEDIUM," "HARD") as a covariate worsened the model fit. This suggests that the empirically derived difficulty parameters from the IRT models were more effective at explaining student response patterns than the pre-assigned labels.

Outputs & Visualizations

The primary outputs of this project are:

Item Parameter Estimates: A final CSV file, 2pl_domain_difficulty_item_parameters.csv, containing the estimated parameters for all 174 items.

ICC_Plots_math Folder: Contains the theoretical Item Characteristic Curve (ICC) for each item. These curves show the relationship between a student's ability ($\theta$) and their probability of answering correctly based on the model's estimated parameters (a and b).

Empirical_Plots_math Folder: Contains empirical plots that show the actual student responses (0 for incorrect, 1 for correct) against their estimated abilities, with the model-predicted ICC overlaid for visual fit assessment.

Installation & Usage: To run the analysis, first install the required dependencies:

pip install --quiet py-irt torch pyro-ppl pandas jsonlines Then, ensure the input CSV files are accessible in your environment (e.g., in your Google Drive) and run the cells sequentially in the Math Assessment.ipynb notebook.

About

This repository contains Python code for analyzing and assessing the validity of mathematics diagnostic assessment for incoming undergraduate students.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published