Skip to content

normet: Normalisation, Decomposition, and Counterfactual Modelling for Environmental Time-series

License

Notifications You must be signed in to change notification settings

normet-dev/normet-py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

26 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

normet (Normalisation, Decomposition, and Counterfactual Modelling for Environmental Time-series) is a Python package designed for environmental time-series analysis. It provides a powerful and user-friendly suite of tools for air quality research, causal inference, and policy evaluation.


✨ Core Strengths

  • Automated & Intelligent: Powered by FLAML AutoML and H2O AutoML backend, it automatically finds the optimal model, eliminating tedious manual tuning.
  • All-in-One Solution: Offers high-level functions that cover the entire workflow, from data preprocessing and model training to weather normalisation and counterfactual modelling.
  • Robust Causal Inference: Integrates both classic and machine-learning-based Synthetic Control Methods (SCM) and provides multiple uncertainty quantification tools (Bootstrap, Jackknife, Placebo Tests) to ensure reliable conclusions.
  • Designed for Environmental Science: Its features are built to address core challenges in air quality research, such as isolating meteorological impacts and evaluating policy effectiveness.

πŸš€ Workflow

The core workflow of normet (import normet as nm) is designed to simplify complex analytical steps:

  1. Data Preparation (nm.prepare_data): Automatically processes time-series data, including imputation, feature engineering (e.g., time-based variables), and dataset splitting.
  2. Model Training (nm.train_model): Trains high-performance machine learning models using H2O AutoML.
  3. Analysis & Application:
    • Weather Normalisation (nm.normalise): Removes the influence of meteorological conditions on pollutant concentrations.
    • Time-Series Decomposition (nm.decompose): Decomposes the series into meteorology-driven and emission-driven components.
    • Counterfactual Modelling (nm.run_scm): Estimates the causal effect of an intervention (e.g., a new policy).

πŸ”§ Installation

You can install the stable version of normet from PyPI:

pip install normet

Install the latest development version from GitHub:

pip install git+https://github.com/normet-dev/normet-py.git

Backend Setup

normet relies on the FLAML or H2O machine learning platform as its core backend. Install FLAML If you choose it as its core backend.

conda install flaml -c conda-forge

Or install H2O If you choose it as its core backend.

# Install the h2o package from PyPI
pip install h2o

πŸ’‘ Quick Start: One-Shot Weather Normalisation

With the nm.do_all function, you can perform a complete weather normalisation workflow in just a few lines of code.

import normet as nm
import pandas as pd # For data manipulation

# Load the example data
my1=pd.read_csv(r'data_MY1_data.csv',parse_dates=['date'],index_col='date')

# Define the feature variables for the model
predictors = [
    "u10", "v10", "d2m", "t2m", "blh", "sp", "ssrd", "tcc", "tp", "rh2m",
    "date_unix", "day_julian", "weekday", "hour"
]

features_to_use = [
    "u10", "v10", "d2m", "t2m", "blh", "sp", "ssrd", "tcc", "tp", "rh2m"
]

# Run the end-to-end pipeline
# nm.do_all automatically handles data prep, model training, and normalisation
results = nm.do_all(
    df=my1,
    value="PM2.5",
    backend = "flaml", # Or "h2o"
    feature_names=predictors,
    variables_resample=features_to_use, # Specify met variables to resample to remove their effect
    n_samples=100 # Use a small sample size for a quick demo
)

# View the normalised (deweathered) time-series results
print("Normalised (deweathered) time-series:")
print(results['out'].head())

# Inspect the trained AutoML model object
print("\nTrained H2O AutoML Model:")
print(results['model'])

# Evaluate the model's performance
stats = nm.modStats(results['df_prep'], results['model'])
print("\nModel Performance Metrics:")
print(stats)

The nm.do_all function returns a dictionary containing three key elements:

  1. out: A pandas DataFrame with the normalised (deweathered) time-series.
  2. df_prep: The preprocessed data, including training/testing splits.
  3. model: The trained AutoML model object.

Step-by-Step Execution

For more control over the process, you can execute each step manually.

1. Prepare the Data (nm.prepare_data)

This function handles missing value imputation, adds time-based features, and splits the data into training and testing sets.

df_prep = nm.prepare_data(
    df=my1,
    value='PM2.5',
    feature_names=features_to_use,
    split_method='random',
    fraction=0.75
)

2. Train the Model (nm.train_model)

Train a machine learning model using H2O AutoML. The configuration allows you to control the training process.

# Define all predictor variables
target = 'value'

# Configure H2O AutoML
h2o_config = {
    'max_models': 10,
    'include_algos': ["GBM"],
    'sort_metric': "RMSE",
    'max_mem_size': "8G"
}



# Or Configure FLAML AutoML
flaml_config = {
    "time_budget": 90,          # seconds for the search
    "metric": "r2",             # optimize R^2 (use "mae"/"mse" if preferred)
    "estimator_list": ["lgbm"], # single estimator keeps things fast
}

# Train the model
model = nm.train_model(
    df=df_prep,
    value=target,
    backend="flaml",    #or "h2o"
    variables=predictors,
    model_config=flaml_config  #or h2o_config
)

# Evaluate model performance
nm.modStats(df_prep, model)

3. Perform Normalisation (nm.normalise)

Use the trained model to generate the weather-normalised time-series.

df_normalised = nm.normalise(
    df=df_prep,
    model=model,
    feature_names=predictors,
    variables_resample=features_to_use,
    n_samples=100
)

print(df_normalised.head())

4. Using a Custom Weather Dataset

You can also provide a specific weather dataset via the weather_df argument. This is useful for answering questions like, "What would concentrations have been under the average weather conditions of a different year?"

# For demonstration, create a custom weather dataset using the first 100 rows
custom_weather = df_prep.iloc[0:100][features_to_use].copy()

# Perform normalisation using the custom weather conditions
df_norm_custom = nm.normalise(
    df=df_prep,
    model=model,
    weather_df=custom_weather,
    feature_names=predictors,
    variables_resample=features_to_use,
    n_samples=100 # n_samples will now sample from `custom_weather`
)

print(df_norm_custom.head())

πŸ“Š Core Features Showcase

In addition to the high-level pipeline, normet offers flexible, modular functions for custom, step-by-step analyses.

1. Weather Normalisation & Time-Series Decomposition

Rolling Weather Normalisation (nm.rolling)

Ideal for short-term trend analysis, this function performs normalisation within a moving time window to capture dynamic changes.

# Assuming you have `df_prep` and `model` from the quick start
df_norm_rolling = nm.rolling(
    df=df_prep,
    value='value',
    model=model,
    feature_names=predictors,
    variables_resample=features_to_use,
    n_samples=100,
    window_days=14,      # Window size in days
    rolling_every=7      # Step size in days
)
print(df_norm_rolling.head())

Time-Series Decomposition (nm.decompose)

Decomposes the original time series into its emission-driven and meteorology-driven components.

# Decompose to get the emission-driven component
df_emi = nm.decompose(
    method="emission",
    df=df_prep,
    value="value",
    model=model,
    feature_names=predictors,
    n_samples=100
)
print(df_emi.head())

# Decompose to get the meteorology-driven component
df_met = nm.decompose(
    method="meteorology",
    df=df_prep,
    value="value",
    model=model,
    feature_names=predictors,
    n_samples=100
)
print(df_met.head())

2. Counterfactual Modelling & Causal Inference

normet includes a powerful toolkit for Synthetic Control Methods (SCM) to evaluate the causal impact of policies or events.

Data Preparation

import pandas as pd

# Load the SCM example data
scm_data = pd.read_csv('data_AQ_Weekly.csv',parse_dates=['date'])

# Ensure the date column is datetime and filter
df=scm_data.query(f"date>='2015-05-01'").query(f"date<'2016-04-30'")

# Define the treated unit, donor pool, and intervention date
treated_unit = "2+26 cities"
donor_pool = [
    "Dongguan", "Zhongshan", "Foshan", "Beihai", "Nanning", "Nanchang", "Xiamen",
    "Taizhou", "Ningbo", "Guangzhou", "Huizhou", "Hangzhou", "Liuzhou",
    "Shantou", "Jiangmen", "Heyuan", "Quanzhou", "Haikou", "Shenzhen",
    "Wenzhou", "Huzhou", "Zhuhai", "Fuzhou", "Shaoxing", "Zhaoqing",
    "Zhoushan", "Quzhou", "Jinhua", "Shaoguan", "Sanya", "Jieyang",
    "Meizhou", "Shanwei", "Zhanjiang", "Chaozhou", "Maoming", "Yangjiang"
]
df=df[df['ID'].isin(donor_pool+["2+26 cities"])]
cutoff_date = "2015-10-23" # Define the intervention start date

Running a Synthetic Control Analysis (nm.run_scm)

# Run classic SCM or the machine learning-based MLSCM
scm_result = nm.run_scm(
    df=df,
    date_col="date",
    outcome_col="SO2wn",
    unit_col="ID",
    treated_unit=treated_unit,
    donors=donor_pool,
    cutoff_date=cutoff_date,
    scm_backend="scm" # Options: 'scm' or 'mlscm'
)
print(scm_result.tail())

Placebo Tests (nm.placebo_in_space)

Check the significance of the main effect by iteratively treating each control unit as the "treated" unit and running a "fake" intervention.

placebo_results = nm.placebo_in_space(
    df=df,
    date_col="date",
    outcome_col="SO2wn",
    unit_col="ID",
    treated_unit=treated_unit,
    donors=donor_pool,
    cutoff_date=cutoff_date,
    scm_backend="scm", # Options: 'scm' or 'mlscm'
    verbose=False
)

# Calculate confidence bands from the placebo effects
bands = nm.effect_bands_space(placebo_results, level=0.95, method="quantile")

# Plot the main effect with the placebo bands
nm.plot_effect_with_bands(bands, cutoff_date=cutoff_date, title="SCM Effect (95% placebo bands)")

Uncertainty Quantification (nm.uncertainty_bands)

Generate confidence intervals for the causal effect using Bootstrap or Jackknife methods.

# Bootstrap method
boot_bands = nm.uncertainty_bands(
    df=df,
    date_col="date",
    outcome_col="SO2wn",
    unit_col="ID",
    treated_unit=treated_unit,
    donors=donor_pool,
    cutoff_date=cutoff_date,
    scm_backend="scm", # Options: 'scm' or 'mlscm'
    method="bootstrap",
    B=50 # Use a small number of replications for a quick demo
)
nm.plot_uncertainty_bands(boot_bands, cutoff_date=cutoff_date)

# Jackknife (leave-one-out) method
jack_bands = nm.uncertainty_bands(
    df=df,
    date_col="date",
    outcome_col="SO2wn",
    unit_col="ID",
    treated_unit=treated_unit,
    donors=donor_pool,
    cutoff_date=cutoff_date,
    scm_backend="scm",
    method="jackknife"
)
nm.plot_uncertainty_bands(jack_bands, cutoff_date=cutoff_date)

πŸ“¦ Dependencies

  • Python (>= 3.8)
  • Core Dependencies: flaml or h2o, pandas, numpy
  • SCM Features: scikit-learn, statsmodels
  • Suggested: logging (Python stdlib)

πŸ“œ How to Cite

If you use normet in your research, please cite it as follows:

@Manual{normet-pkg,
  title = {normet: Normalisation, Decomposition, and Counterfactual Modelling for Environmental Time-series},
  author = {Congbo Song and Other Contributors},
  year = {2025},
  note = {Python package version 0.0.1},
  organization = {University of Manchester},
  url = {https://github.com/normet-dev/normet-py},
}

πŸ“„ License

This project is licensed under the MIT LICENSE.


🀝 How to Contribute

Contributions are welcome! This project is released with a Contributor Code of Conduct. By participating, you agree to abide by its terms.

Please submit bug reports and feature requests via the GitHub Issues.

About

normet: Normalisation, Decomposition, and Counterfactual Modelling for Environmental Time-series

Resources

License

Stars

Watchers

Forks

Packages

No packages published