Skip to content

pnnl/GreenButton_Privacy

Repository files navigation

Differential Privacy Protection Mechanism Code (MATLAB)

Introduction

This repository provides MATLAB scripts to apply differential privacy (DP) to 15-minute interval residential smart meter data. The implementation is designed to be compatible with data structures from standards like Green Button.

The primary objective is to enable the use of high-resolution energy data for grid analytics—such as load forecasting, energy benchmarking, and monitoring of customer owned assets—while providing formal, quantifiable privacy guarantees for individual households. The method preserves the statistical utility of aggregate load profiles.

High level overview

The proposed differential privacy mechanisms operate in the frequency domain to separate low-frequency components, which reveal overall load patterns from high-frequency components that may leak customer-level behaviors. This allows for targeted noise injection that preserves the utility of the data for power systems analysis.

The process is as follows:

  • Transform: Time-series load data is converted to the frequency domain using the Discrete Cosine Transform (DCT).

  • Filter: Low-energy, high-frequency coefficients are zeroed out via a variance threshold. This focuses the privacy budget on the coefficients that define the base load shape.

  • Anonymize: Calibrated Gaussian noise is added to the remaining significant DCT coefficients. The noise magnitude is determined by the privacy budget (ε, δ) and a pre-calculated sensitivity.

  • Reconstruct: The data is returned to the time domain using the Inverse DCT.

About Differential Privacy

Differential Privacy provides formal guarantees to protect individual-level data while ensuring population-level trends remain usable for decision-making processes. It achieves this by adding calibrated amount of noise to sensitive data, obfuscating individual contributions without distorting broader patterns. This approach is valuable in energy analytics, where preserving structural trends (e.g., load peaks) is critical, but granular behavioral details must remain private (when does the customer wake up?).

Mechanism Details:

Gaussian Noise Addition

Gaussian noise is added selectively to Discrete Cosine Transform (DCT) coefficients, minimizing distortion of base load patterns while masking sensitive behaviors.

Advantages: Predictable error scaling and compatibility with real-valued, high-dimensional datasets.

Supports (ϵ-δ)-DP guarantees, enabling favorable privacy–utility tradeoffs.

This repository implements two distinct adjacency models, each corresponding to a different data-sharing scenario and threat model.

Adjacency Models

  • Household-Level Adjacency: Designed for aggregate data sharing, where entire households' removal or inclusion is protected. Suitable for virtual building profiles and similar aggregated metrics.

  • Behavioral Adjacency: Protects intra-household energy usage patterns, including cyclic routines or appliance schedules. Focuses on safeguarding granular time-sensitive behaviors.

Sensitivity Calibration:

Sensitivity quantifies how output changes based on adjacent datasets and determines the scale of noise required to meet DP guarantees. Sensitivity calibration in this work uses Euclidean distance metrics in the frequency domain (DCT coefficients). Distinct calibration strategies are employed for the two adjacency models:

  • Household-Level Adjacency: Sensitivity derived from aggregated contributions of individual households to dataset-wide metrics.

  • Behavioral Adjacency: Intra-household sensitivity calibrated for dominant cyclic patterns (e.g., daily routines).

Dataset

The algorithms are tested on the Home Energy Metering Study (HEMS) dataset developed by the Northwest Energy Efficiency Alliance (NEEA). The dataset contains anonymized residential energy consumption time series with a 15-minute resolution, spanning one year across 93 households. The dataset has been extracted using the PETSAFE tool.

Repository Structure

The codebase consists of calibration scripts, and processing scripts. These have been tailored to process the NEEA HEMS dataset records.

Calibration Scripts

These scripts are used to determine the sensitivity parameters ($S_2$) required for the DP mechanism.

  • householdSensitivity.m:

  • Purpose: Calibrates sensitivity for the Household-Level Adjacency model.

  • Method: Calculates the Euclidean distance in the DCT domain when individual households are added/removed. Evaluates sensitivity across various cycle lengths (e.g., 8 hours, 1 day, 7 days) and group sizes.

  • Key Output: Norm values ($S_2$) used to scale noise in the application phase.

  • behavioralSensitivity.m:

  • Purpose: Calibrates sensitivity for the Behavioral Adjacency model.

  • Method: Analyzes intra-household perturbations by overlaying cyclic consumption periods (e.g., daily cycles). It computes the maximum difference at each time step and its DCT norm to determine how much a single household's behavior varies.

  • Key Output: Sensitivity parameters for specific cycle lengths (e.g., 6h, 12h, 24h).

DP Application Scripts

These scripts apply the full DP mechanism (Noise Addition) using the calibrated parameters.

  • appliedMechanism_Household.m:

  • Purpose: Implements the full DP mechanism for aggregated groups of households.

  • Process: Aggregates time series data $\rightarrow$ Applies DCT $\rightarrow$ Thresholds low-variance coefficients $\rightarrow$ Adds Gaussian noise based on privacy budget $\epsilon$ and sensitivity $S_2$ $\rightarrow$ Reconstructs via Inverse DCT.

  • Metrics: Calculates RMSE, Correlation Coefficient, and peak demand differences to assess utility.

  • appliedMechanism_Behavioral.m:

  • Purpose: Implements the full DP mechanism for individual household data.

  • Process: Similar to the household mechanism but applied to single-home time series to protect behavioral patterns.

  • Metrics: Compares original vs. private consumption profiles using histograms and error metrics.

Utility Functions

  • f_normalize.m:

  • Normalizes input data to a standard range (typically $[0, 1]$ or $[-1, 1]$) to ensure consistent processing.

  • f_varianceThreshold.m:

  • Determines which DCT coefficients to preserve based on a cumulative energy threshold (e.g., 90%). Sets less significant coefficients to zero to reduce the dimensionality of noise addition and improve utility.

Data

  • dataOneYear93homes.mat:

  • Contains synthetic demand data for 93 homes from the Home Energy Metering Study (HEMS) developed by the Northwest Energy Efficiency Alliance (NEEA).

  • Resolution: 15-minute intervals.

  • Duration: One year.

Technical Details and Parameters

Privacy Parameters

epsilon (ε): The privacy budget. Controls the privacy-utility tradeoff. Lower ε provides stronger privacy guarantees at the cost of data utility.

delta (δ): The probability that the privacy guarantee fails. A small, fixed value is standard (e.g., $10^(-2)$ or smaller, often related to the inverse of the dataset size).

Variance Threshold: A parameter (e.g., 0.9) in f_varianceThreshold.m that determines the percentage of signal energy to retain. Higher values retain more detail but may require a larger privacy budget for the same level of noise.

Frequency-Domain Noise Targeting

Noise is selectively applied to DCT coefficients that dominate energy trends. Coefficients capturing short-term fluctuations or background noise are excluded (via f_varianceThreshold.m ) to enforce the privacy budget only where it matters most for utility.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages