Skip to content

Simple project to wrangle with geospatial data and practice using data science tools.

Notifications You must be signed in to change notification settings

antonio-RN/air_quality

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Air quality analysis

Introduction

This is an exploratory analysis of air quality reports for Catalonia region, updated as of 14/05/2025. Data is downloaded from the open data portal of Catalan Government (Generalitat de Catalunya), and includes hourly air quality metrics as well as geospatial information of the capture points.

The objective of this analysis is to find the biggest contributors to air pollution in Catalonia for the time period of the data. In order to get to this, both time evolution and geographic position will be taken into account.

Project workflow

The analysis will be structured around a Prefect pipeline, written in Python and uses the following tools / packages:

  • uv (package dependency manager)
  • quarto (report renderer)
  • pandas (data wrangling) via dask
  • altair (data visualization)
  • geopandas (geospatial analysis)
  • darts (time series analysis)

The diagram of the different processes applied to the data is shown below:

  • Data ingestion and type checking: reading the raw CSV file, converting to appropriate data types, saving data to a parquet file.
  • Missing data handling: analyzing missing data and handling it properly (removing or imputing), saving "bronze" data to a parquet file.
  • General exploration: checking data overview to spot inconsistencies and get a general grasp of the data "shape".
  • Feature creation: adding new features or characteristics based on geographic knowledge or specific thematic knowledge (air pollution).
  • Time evolution: checking trends through time in the general dataset.
  • Geographic inequalities: checking differences between geographic points and their relationship to them.
  • Combined analysis: mixing time evolution and geographic inequialities to identify broader patterns.

WIP --> to be updated as the analysis progresses.

Head over to notebooks/air_quality_exploration.html in order to see the step-by-step analysis of the data.

About

Simple project to wrangle with geospatial data and practice using data science tools.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors