A collection of data-driven projects developed as part of the INF2178H: Experimental Design and Analysis for Human-Centred Data Science course. Each project demonstrates a different stage of the data science lifecycle - from problem formulation and data wrangling to experimental design, statistical analysis, and interpretation of results - with an emphasis on human-centred insights and real-world applicability.
This repository brings together multiple applied projects that highlight:
- The use of experimental and statistical design principles in data science.
- The integration of quantitative and qualitative analytical methods for social insights.
- A reproducible workflow using Python-based data analysis and visualization tools.
- The application of human-centred design thinking to ensure ethical and meaningful data interpretation.
Each project explores a distinct social or behavioural dataset, selected to reflect diverse human and community contexts - from housing and mobility to education and cognition.
This project explores occupancy trends and patterns in the City of Toronto’s Shelter Program, examining program models, capacity types, and COVID-19 response impacts. Using a combination of non-graphical and graphical EDA, including boxplots, winsorized analyses, time series, and barplots, the study revealed significant variations in occupancy across program models, capacity types, and organizations. Welch’s t-tests confirmed statistically significant differences in occupancy rates between emergency and transitional programs, COVID-19 response programs versus others, and room-based versus bed-based capacities. The findings provide actionable insights for stakeholders to optimize resource allocation, improve program planning, and address unique challenges in shelter management.
This project investigates the distribution of child care spaces across different center types, focusing on the effects of CWELCC (Canada-Wide Early Learning and Child Care) participation and subsidy status. Using both exploratory data analysis and ANOVA, the study identifies significant differences in child care capacity among centers, as well as interaction effects between CWELCC participation and subsidy status. The findings provide insights into the allocation of child care resources and highlight how policy and funding mechanisms influence service availability across age groups.
This project investigates the influence of family income on kindergarten students’ academic performance in reading and math across Fall and Spring terms, while controlling for general knowledge scores as a covariate. Using one-way ANCOVA and Tukey’s HSD post hoc tests, the analysis revealed significant differences in academic scores among income groups, with higher income levels associated with better performance. General knowledge scores were also found to be strong predictors of academic outcomes. The findings highlight the role of socioeconomic factors in early educational achievement and suggest avenues for targeted interventions to reduce disparities in learning outcomes.
This project explores variations in cognitive performance using Mini-Mental State Examination (MMSE) scores across different groups (Demented, Non-demented, Converted), genders, and visit frequencies. Through mixed-effects ANOVA, the study identifies significant differences in MMSE scores attributable to group classification and gender, as well as the effect of repeated visits. The findings highlight patterns of cognitive change over time and the variability in performance across demographics, providing insights relevant to psychological research and cognitive health monitoring.