GitHub - normet-dev/normet-py: normet: Normalisation, Decomposition, and Counterfactual Modelling for Environmental Time-series

normet (Normalisation, Decomposition, and Counterfactual Modelling for Environmental Time-series) is a Python package designed for environmental time-series analysis. It provides a powerful and user-friendly suite of tools for air quality research, causal inference, and policy evaluation.

✨ Core Strengths

Automated & Intelligent: Powered by FLAML AutoML and H2O AutoML backend, it automatically finds the optimal model, eliminating tedious manual tuning.
All-in-One Solution: Offers high-level functions that cover the entire workflow, from data preprocessing and model training to weather normalisation and counterfactual modelling.
Robust Causal Inference: Integrates both classic and machine-learning-based Synthetic Control Methods (SCM) and provides multiple uncertainty quantification tools (Bootstrap, Jackknife, Placebo Tests) to ensure reliable conclusions.
Designed for Environmental Science: Its features are built to address core challenges in air quality research, such as isolating meteorological impacts and evaluating policy effectiveness.

🚀 Workflow

The core workflow of normet (import normet as nm) is designed to simplify complex analytical steps:

Data Preparation (nm.prepare_data): Automatically processes time-series data, including imputation, feature engineering (e.g., time-based variables), and dataset splitting.
Model Training (nm.train_model): Trains high-performance machine learning models using H2O AutoML.
Analysis & Application:
- Weather Normalisation (nm.normalise): Removes the influence of meteorological conditions on pollutant concentrations.
- Time-Series Decomposition (nm.decompose): Decomposes the series into meteorology-driven and emission-driven components.
- Counterfactual Modelling (nm.run_scm): Estimates the causal effect of an intervention (e.g., a new policy).

🔧 Installation

You can install the stable version of normet from PyPI:

pip install normet

Install the latest development version from GitHub:

pip install git+https://github.com/normet-dev/normet-py.git

Backend Setup

normet relies on the FLAML or H2O machine learning platform as its core backend. Install FLAML If you choose it as its core backend.

conda install flaml -c conda-forge

Or install H2O If you choose it as its core backend.

# Install the h2o package from PyPI
pip install h2o

💡 Quick Start: One-Shot Weather Normalisation

With the nm.do_all function, you can perform a complete weather normalisation workflow in just a few lines of code.

import normet as nm
import pandas as pd # For data manipulation

# Load the example data
my1=pd.read_csv(r'data_MY1_data.csv',parse_dates=['date'],index_col='date')

# Define the feature variables for the model
predictors = [
    "u10", "v10", "d2m", "t2m", "blh", "sp", "ssrd", "tcc", "tp", "rh2m",
    "date_unix", "day_julian", "weekday", "hour"
]

features_to_use = [
    "u10", "v10", "d2m", "t2m", "blh", "sp", "ssrd", "tcc", "tp", "rh2m"
]

# Run the end-to-end pipeline
# nm.do_all automatically handles data prep, model training, and normalisation
results = nm.do_all(
    df=my1,
    value="PM2.5",
    backend = "flaml", # Or "h2o"
    feature_names=predictors,
    variables_resample=features_to_use, # Specify met variables to resample to remove their effect
    n_samples=100 # Use a small sample size for a quick demo
)

# View the normalised (deweathered) time-series results
print("Normalised (deweathered) time-series:")
print(results['out'].head())

# Inspect the trained AutoML model object
print("\nTrained H2O AutoML Model:")
print(results['model'])

# Evaluate the model's performance
stats = nm.modStats(results['df_prep'], results['model'])
print("\nModel Performance Metrics:")
print(stats)

The nm.do_all function returns a dictionary containing three key elements:

out: A pandas DataFrame with the normalised (deweathered) time-series.
df_prep: The preprocessed data, including training/testing splits.
model: The trained AutoML model object.

Step-by-Step Execution

For more control over the process, you can execute each step manually.

1. Prepare the Data (`nm.prepare_data`)

This function handles missing value imputation, adds time-based features, and splits the data into training and testing sets.

df_prep = nm.prepare_data(
    df=my1,
    value='PM2.5',
    feature_names=features_to_use,
    split_method='random',
    fraction=0.75
)

2. Train the Model (`nm.train_model`)

Train a machine learning model using H2O AutoML. The configuration allows you to control the training process.

# Define all predictor variables
target = 'value'

# Configure H2O AutoML
h2o_config = {
    'max_models': 10,
    'include_algos': ["GBM"],
    'sort_metric': "RMSE",
    'max_mem_size': "8G"
}



# Or Configure FLAML AutoML
flaml_config = {
    "time_budget": 90,          # seconds for the search
    "metric": "r2",             # optimize R^2 (use "mae"/"mse" if preferred)
    "estimator_list": ["lgbm"], # single estimator keeps things fast
}

# Train the model
model = nm.train_model(
    df=df_prep,
    value=target,
    backend="flaml",    #or "h2o"
    variables=predictors,
    model_config=flaml_config  #or h2o_config
)

# Evaluate model performance
nm.modStats(df_prep, model)

3. Perform Normalisation (`nm.normalise`)

Use the trained model to generate the weather-normalised time-series.

df_normalised = nm.normalise(
    df=df_prep,
    model=model,
    feature_names=predictors,
    variables_resample=features_to_use,
    n_samples=100
)

print(df_normalised.head())

4. Using a Custom Weather Dataset

You can also provide a specific weather dataset via the weather_df argument. This is useful for answering questions like, "What would concentrations have been under the average weather conditions of a different year?"

# For demonstration, create a custom weather dataset using the first 100 rows
custom_weather = df_prep.iloc[0:100][features_to_use].copy()

# Perform normalisation using the custom weather conditions
df_norm_custom = nm.normalise(
    df=df_prep,
    model=model,
    weather_df=custom_weather,
    feature_names=predictors,
    variables_resample=features_to_use,
    n_samples=100 # n_samples will now sample from `custom_weather`
)

print(df_norm_custom.head())

📊 Core Features Showcase

In addition to the high-level pipeline, normet offers flexible, modular functions for custom, step-by-step analyses.

1. Weather Normalisation & Time-Series Decomposition

Rolling Weather Normalisation (`nm.rolling`)

Ideal for short-term trend analysis, this function performs normalisation within a moving time window to capture dynamic changes.

# Assuming you have `df_prep` and `model` from the quick start
df_norm_rolling = nm.rolling(
    df=df_prep,
    value='value',
    model=model,
    feature_names=predictors,
    variables_resample=features_to_use,
    n_samples=100,
    window_days=14,      # Window size in days
    rolling_every=7      # Step size in days
)
print(df_norm_rolling.head())

Time-Series Decomposition (`nm.decompose`)

Decomposes the original time series into its emission-driven and meteorology-driven components.

# Decompose to get the emission-driven component
df_emi = nm.decompose(
    method="emission",
    df=df_prep,
    value="value",
    model=model,
    feature_names=predictors,
    n_samples=100
)
print(df_emi.head())

# Decompose to get the meteorology-driven component
df_met = nm.decompose(
    method="meteorology",
    df=df_prep,
    value="value",
    model=model,
    feature_names=predictors,
    n_samples=100
)
print(df_met.head())

2. Counterfactual Modelling & Causal Inference

normet includes a powerful toolkit for Synthetic Control Methods (SCM) to evaluate the causal impact of policies or events.

Data Preparation

import pandas as pd

# Load the SCM example data
scm_data = pd.read_csv('data_AQ_Weekly.csv',parse_dates=['date'])

# Ensure the date column is datetime and filter
df=scm_data.query(f"date>='2015-05-01'").query(f"date<'2016-04-30'")

# Define the treated unit, donor pool, and intervention date
treated_unit = "2+26 cities"
donor_pool = [
    "Dongguan", "Zhongshan", "Foshan", "Beihai", "Nanning", "Nanchang", "Xiamen",
    "Taizhou", "Ningbo", "Guangzhou", "Huizhou", "Hangzhou", "Liuzhou",
    "Shantou", "Jiangmen", "Heyuan", "Quanzhou", "Haikou", "Shenzhen",
    "Wenzhou", "Huzhou", "Zhuhai", "Fuzhou", "Shaoxing", "Zhaoqing",
    "Zhoushan", "Quzhou", "Jinhua", "Shaoguan", "Sanya", "Jieyang",
    "Meizhou", "Shanwei", "Zhanjiang", "Chaozhou", "Maoming", "Yangjiang"
]
df=df[df['ID'].isin(donor_pool+["2+26 cities"])]
cutoff_date = "2015-10-23" # Define the intervention start date

Running a Synthetic Control Analysis (`nm.run_scm`)

# Run classic SCM or the machine learning-based MLSCM
scm_result = nm.run_scm(
    df=df,
    date_col="date",
    outcome_col="SO2wn",
    unit_col="ID",
    treated_unit=treated_unit,
    donors=donor_pool,
    cutoff_date=cutoff_date,
    scm_backend="scm" # Options: 'scm' or 'mlscm'
)
print(scm_result.tail())

Placebo Tests (`nm.placebo_in_space`)

Check the significance of the main effect by iteratively treating each control unit as the "treated" unit and running a "fake" intervention.

placebo_results = nm.placebo_in_space(
    df=df,
    date_col="date",
    outcome_col="SO2wn",
    unit_col="ID",
    treated_unit=treated_unit,
    donors=donor_pool,
    cutoff_date=cutoff_date,
    scm_backend="scm", # Options: 'scm' or 'mlscm'
    verbose=False
)

# Calculate confidence bands from the placebo effects
bands = nm.effect_bands_space(placebo_results, level=0.95, method="quantile")

# Plot the main effect with the placebo bands
nm.plot_effect_with_bands(bands, cutoff_date=cutoff_date, title="SCM Effect (95% placebo bands)")

Uncertainty Quantification (`nm.uncertainty_bands`)

Generate confidence intervals for the causal effect using Bootstrap or Jackknife methods.

# Bootstrap method
boot_bands = nm.uncertainty_bands(
    df=df,
    date_col="date",
    outcome_col="SO2wn",
    unit_col="ID",
    treated_unit=treated_unit,
    donors=donor_pool,
    cutoff_date=cutoff_date,
    scm_backend="scm", # Options: 'scm' or 'mlscm'
    method="bootstrap",
    B=50 # Use a small number of replications for a quick demo
)
nm.plot_uncertainty_bands(boot_bands, cutoff_date=cutoff_date)

# Jackknife (leave-one-out) method
jack_bands = nm.uncertainty_bands(
    df=df,
    date_col="date",
    outcome_col="SO2wn",
    unit_col="ID",
    treated_unit=treated_unit,
    donors=donor_pool,
    cutoff_date=cutoff_date,
    scm_backend="scm",
    method="jackknife"
)
nm.plot_uncertainty_bands(jack_bands, cutoff_date=cutoff_date)

📦 Dependencies

Python (>= 3.8)
Core Dependencies: flaml or h2o, pandas, numpy
SCM Features: scikit-learn, statsmodels
Suggested: logging (Python stdlib)

📜 How to Cite

If you use normet in your research, please cite it as follows:

@Manual{normet-pkg,
  title = {normet: Normalisation, Decomposition, and Counterfactual Modelling for Environmental Time-series},
  author = {Congbo Song and Other Contributors},
  year = {2025},
  note = {Python package version 0.0.1},
  organization = {University of Manchester},
  url = {https://github.com/normet-dev/normet-py},
}

📄 License

This project is licensed under the MIT LICENSE.

🤝 How to Contribute

Contributions are welcome! This project is released with a Contributor Code of Conduct. By participating, you agree to abide by its terms.

Please submit bug reports and feature requests via the GitHub Issues.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github/workflows		.github/workflows
notebooks		notebooks
src/normet		src/normet
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

✨ Core Strengths

🚀 Workflow

🔧 Installation

Backend Setup

💡 Quick Start: One-Shot Weather Normalisation

Step-by-Step Execution

1. Prepare the Data (`nm.prepare_data`)

2. Train the Model (`nm.train_model`)

3. Perform Normalisation (`nm.normalise`)

4. Using a Custom Weather Dataset

📊 Core Features Showcase

1. Weather Normalisation & Time-Series Decomposition

Rolling Weather Normalisation (`nm.rolling`)

Time-Series Decomposition (`nm.decompose`)

2. Counterfactual Modelling & Causal Inference

Data Preparation

Running a Synthetic Control Analysis (`nm.run_scm`)

Placebo Tests (`nm.placebo_in_space`)

Uncertainty Quantification (`nm.uncertainty_bands`)

📦 Dependencies

📜 How to Cite

📄 License

🤝 How to Contribute

About

Uh oh!

Releases 4

Packages

Languages

License

normet-dev/normet-py

Folders and files

Latest commit

History

Repository files navigation

✨ Core Strengths

🚀 Workflow

🔧 Installation

Backend Setup

💡 Quick Start: One-Shot Weather Normalisation

Step-by-Step Execution

1. Prepare the Data (nm.prepare_data)

2. Train the Model (nm.train_model)

3. Perform Normalisation (nm.normalise)

4. Using a Custom Weather Dataset

📊 Core Features Showcase

1. Weather Normalisation & Time-Series Decomposition

Rolling Weather Normalisation (nm.rolling)

Time-Series Decomposition (nm.decompose)

2. Counterfactual Modelling & Causal Inference

Data Preparation

Running a Synthetic Control Analysis (nm.run_scm)

Placebo Tests (nm.placebo_in_space)

Uncertainty Quantification (nm.uncertainty_bands)

📦 Dependencies

📜 How to Cite

📄 License

🤝 How to Contribute

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Languages

1. Prepare the Data (`nm.prepare_data`)

2. Train the Model (`nm.train_model`)

3. Perform Normalisation (`nm.normalise`)

Rolling Weather Normalisation (`nm.rolling`)

Time-Series Decomposition (`nm.decompose`)

Running a Synthetic Control Analysis (`nm.run_scm`)

Placebo Tests (`nm.placebo_in_space`)

Uncertainty Quantification (`nm.uncertainty_bands`)

Packages