normet (Normalisation, Decomposition, and Counterfactual Modelling for Environmental Time-series) is a Python package designed for environmental time-series analysis. It provides a powerful and user-friendly suite of tools for air quality research, causal inference, and policy evaluation.
- Automated & Intelligent: Powered by FLAML AutoML and H2O AutoML backend, it automatically finds the optimal model, eliminating tedious manual tuning.
- All-in-One Solution: Offers high-level functions that cover the entire workflow, from data preprocessing and model training to weather normalisation and counterfactual modelling.
- Robust Causal Inference: Integrates both classic and machine-learning-based Synthetic Control Methods (SCM) and provides multiple uncertainty quantification tools (Bootstrap, Jackknife, Placebo Tests) to ensure reliable conclusions.
- Designed for Environmental Science: Its features are built to address core challenges in air quality research, such as isolating meteorological impacts and evaluating policy effectiveness.
The core workflow of normet (import normet as nm) is designed to simplify complex analytical steps:
- Data Preparation (
nm.prepare_data): Automatically processes time-series data, including imputation, feature engineering (e.g., time-based variables), and dataset splitting. - Model Training (
nm.train_model): Trains high-performance machine learning models using H2O AutoML. - Analysis & Application:
- Weather Normalisation (
nm.normalise): Removes the influence of meteorological conditions on pollutant concentrations. - Time-Series Decomposition (
nm.decompose): Decomposes the series into meteorology-driven and emission-driven components. - Counterfactual Modelling (
nm.run_scm): Estimates the causal effect of an intervention (e.g., a new policy).
- Weather Normalisation (
You can install the stable version of normet from PyPI:
pip install normetInstall the latest development version from GitHub:
pip install git+https://github.com/normet-dev/normet-py.gitnormet relies on the FLAML or H2O machine learning platform as its core backend.
Install FLAML If you choose it as its core backend.
conda install flaml -c conda-forgeOr install H2O If you choose it as its core backend.
# Install the h2o package from PyPI
pip install h2oWith the nm.do_all function, you can perform a complete weather normalisation workflow in just a few lines of code.
import normet as nm
import pandas as pd # For data manipulation
# Load the example data
my1=pd.read_csv(r'data_MY1_data.csv',parse_dates=['date'],index_col='date')
# Define the feature variables for the model
predictors = [
"u10", "v10", "d2m", "t2m", "blh", "sp", "ssrd", "tcc", "tp", "rh2m",
"date_unix", "day_julian", "weekday", "hour"
]
features_to_use = [
"u10", "v10", "d2m", "t2m", "blh", "sp", "ssrd", "tcc", "tp", "rh2m"
]
# Run the end-to-end pipeline
# nm.do_all automatically handles data prep, model training, and normalisation
results = nm.do_all(
df=my1,
value="PM2.5",
backend = "flaml", # Or "h2o"
feature_names=predictors,
variables_resample=features_to_use, # Specify met variables to resample to remove their effect
n_samples=100 # Use a small sample size for a quick demo
)
# View the normalised (deweathered) time-series results
print("Normalised (deweathered) time-series:")
print(results['out'].head())
# Inspect the trained AutoML model object
print("\nTrained H2O AutoML Model:")
print(results['model'])
# Evaluate the model's performance
stats = nm.modStats(results['df_prep'], results['model'])
print("\nModel Performance Metrics:")
print(stats)The nm.do_all function returns a dictionary containing three key elements:
out: A pandas DataFrame with the normalised (deweathered) time-series.df_prep: The preprocessed data, including training/testing splits.model: The trained AutoML model object.
For more control over the process, you can execute each step manually.
This function handles missing value imputation, adds time-based features, and splits the data into training and testing sets.
df_prep = nm.prepare_data(
df=my1,
value='PM2.5',
feature_names=features_to_use,
split_method='random',
fraction=0.75
)Train a machine learning model using H2O AutoML. The configuration allows you to control the training process.
# Define all predictor variables
target = 'value'
# Configure H2O AutoML
h2o_config = {
'max_models': 10,
'include_algos': ["GBM"],
'sort_metric': "RMSE",
'max_mem_size': "8G"
}
# Or Configure FLAML AutoML
flaml_config = {
"time_budget": 90, # seconds for the search
"metric": "r2", # optimize R^2 (use "mae"/"mse" if preferred)
"estimator_list": ["lgbm"], # single estimator keeps things fast
}
# Train the model
model = nm.train_model(
df=df_prep,
value=target,
backend="flaml", #or "h2o"
variables=predictors,
model_config=flaml_config #or h2o_config
)
# Evaluate model performance
nm.modStats(df_prep, model)Use the trained model to generate the weather-normalised time-series.
df_normalised = nm.normalise(
df=df_prep,
model=model,
feature_names=predictors,
variables_resample=features_to_use,
n_samples=100
)
print(df_normalised.head())You can also provide a specific weather dataset via the weather_df argument. This is useful for answering questions like, "What would concentrations have been under the average weather conditions of a different year?"
# For demonstration, create a custom weather dataset using the first 100 rows
custom_weather = df_prep.iloc[0:100][features_to_use].copy()
# Perform normalisation using the custom weather conditions
df_norm_custom = nm.normalise(
df=df_prep,
model=model,
weather_df=custom_weather,
feature_names=predictors,
variables_resample=features_to_use,
n_samples=100 # n_samples will now sample from `custom_weather`
)
print(df_norm_custom.head())In addition to the high-level pipeline, normet offers flexible, modular functions for custom, step-by-step analyses.
Ideal for short-term trend analysis, this function performs normalisation within a moving time window to capture dynamic changes.
# Assuming you have `df_prep` and `model` from the quick start
df_norm_rolling = nm.rolling(
df=df_prep,
value='value',
model=model,
feature_names=predictors,
variables_resample=features_to_use,
n_samples=100,
window_days=14, # Window size in days
rolling_every=7 # Step size in days
)
print(df_norm_rolling.head())Decomposes the original time series into its emission-driven and meteorology-driven components.
# Decompose to get the emission-driven component
df_emi = nm.decompose(
method="emission",
df=df_prep,
value="value",
model=model,
feature_names=predictors,
n_samples=100
)
print(df_emi.head())
# Decompose to get the meteorology-driven component
df_met = nm.decompose(
method="meteorology",
df=df_prep,
value="value",
model=model,
feature_names=predictors,
n_samples=100
)
print(df_met.head())normet includes a powerful toolkit for Synthetic Control Methods (SCM) to evaluate the causal impact of policies or events.
import pandas as pd
# Load the SCM example data
scm_data = pd.read_csv('data_AQ_Weekly.csv',parse_dates=['date'])
# Ensure the date column is datetime and filter
df=scm_data.query(f"date>='2015-05-01'").query(f"date<'2016-04-30'")
# Define the treated unit, donor pool, and intervention date
treated_unit = "2+26 cities"
donor_pool = [
"Dongguan", "Zhongshan", "Foshan", "Beihai", "Nanning", "Nanchang", "Xiamen",
"Taizhou", "Ningbo", "Guangzhou", "Huizhou", "Hangzhou", "Liuzhou",
"Shantou", "Jiangmen", "Heyuan", "Quanzhou", "Haikou", "Shenzhen",
"Wenzhou", "Huzhou", "Zhuhai", "Fuzhou", "Shaoxing", "Zhaoqing",
"Zhoushan", "Quzhou", "Jinhua", "Shaoguan", "Sanya", "Jieyang",
"Meizhou", "Shanwei", "Zhanjiang", "Chaozhou", "Maoming", "Yangjiang"
]
df=df[df['ID'].isin(donor_pool+["2+26 cities"])]
cutoff_date = "2015-10-23" # Define the intervention start date# Run classic SCM or the machine learning-based MLSCM
scm_result = nm.run_scm(
df=df,
date_col="date",
outcome_col="SO2wn",
unit_col="ID",
treated_unit=treated_unit,
donors=donor_pool,
cutoff_date=cutoff_date,
scm_backend="scm" # Options: 'scm' or 'mlscm'
)
print(scm_result.tail())Check the significance of the main effect by iteratively treating each control unit as the "treated" unit and running a "fake" intervention.
placebo_results = nm.placebo_in_space(
df=df,
date_col="date",
outcome_col="SO2wn",
unit_col="ID",
treated_unit=treated_unit,
donors=donor_pool,
cutoff_date=cutoff_date,
scm_backend="scm", # Options: 'scm' or 'mlscm'
verbose=False
)
# Calculate confidence bands from the placebo effects
bands = nm.effect_bands_space(placebo_results, level=0.95, method="quantile")
# Plot the main effect with the placebo bands
nm.plot_effect_with_bands(bands, cutoff_date=cutoff_date, title="SCM Effect (95% placebo bands)")Generate confidence intervals for the causal effect using Bootstrap or Jackknife methods.
# Bootstrap method
boot_bands = nm.uncertainty_bands(
df=df,
date_col="date",
outcome_col="SO2wn",
unit_col="ID",
treated_unit=treated_unit,
donors=donor_pool,
cutoff_date=cutoff_date,
scm_backend="scm", # Options: 'scm' or 'mlscm'
method="bootstrap",
B=50 # Use a small number of replications for a quick demo
)
nm.plot_uncertainty_bands(boot_bands, cutoff_date=cutoff_date)
# Jackknife (leave-one-out) method
jack_bands = nm.uncertainty_bands(
df=df,
date_col="date",
outcome_col="SO2wn",
unit_col="ID",
treated_unit=treated_unit,
donors=donor_pool,
cutoff_date=cutoff_date,
scm_backend="scm",
method="jackknife"
)
nm.plot_uncertainty_bands(jack_bands, cutoff_date=cutoff_date)- Python (>= 3.8)
- Core Dependencies:
flamlorh2o,pandas,numpy - SCM Features:
scikit-learn,statsmodels - Suggested:
logging(Python stdlib)
If you use normet in your research, please cite it as follows:
@Manual{normet-pkg,
title = {normet: Normalisation, Decomposition, and Counterfactual Modelling for Environmental Time-series},
author = {Congbo Song and Other Contributors},
year = {2025},
note = {Python package version 0.0.1},
organization = {University of Manchester},
url = {https://github.com/normet-dev/normet-py},
}This project is licensed under the MIT LICENSE.
Contributions are welcome! This project is released with a Contributor Code of Conduct. By participating, you agree to abide by its terms.
Please submit bug reports and feature requests via the GitHub Issues.