Battery-Aided Load Shaping Strategies based on Reinforcement Learning for Optimal Privacy-Cost Trade-off in Smart Electricity Metering
The rising popularity of smart meters arised privacy concern on its fine-grained electricity consumption data collection. Sharing such data with utility providers may expose household members’ private information on its usage habits, potentially leading to unwanted surveillance and profiling. One promising approach to reduce private information leakage from the smart meter data is to use load shaping strategies. For example, one may create artificial grid load by using an energy management unit (EMU) with a rechargeable battery (RB) to mask the household’s load. Previous studies have shown that the EMU policy can be learnt using reinforcement learning (RL) with a mutual information (MI)-based reward signal. However, its adaptation is limited on quantized household load and charging/discharging power, and low sample rate. To address this limitation, we extend the EMU policy with MI-based reward signal to support continuous household load and charging/discharging power on a relatively high sample rate. The approach is implemented with a policy gradient algorithm namely proximal policy optimization (PPO). Performance of the new algorithm (PPO-MI) is evaluated using an actual SMs dataset and compared with its state-of-the-art quantized counterpart (DDQL-MI). Our results show significant improvement over its quantized counterpart in both privacy and cost metrics. PPO-MI achieved 69.24% reduction in average MI compared to DDQL-MI under balanced privacy- cost trade-off, while reducing the incurred extra electricity cost by 18.36%. This work will be submitted to the IEEE Transactions on Smart Grid, with the draft paper available in Appendix G.
We first create the dataset, then execute the training scripts.
-
Download the UK-DALE dataset (Disaggregated (6s) appliance power and aggregated (1s) whole house power) from UK-DALE dataset download page. Put the downloaded
UK-DALE-2017.tgzunder./datasetsby creating adatasetsfolder. -
Unzip the
UK-DALE-2017.tgzfolder, and undzip theukdale.zipfile. The resulting folder strucutre should be like thisdatasets |-- UK-DALE-FULL-disaggregated | |-- ukdale | | |-- house_1 -
Create a conda virtual environment using
environment.yml. Activate the environment. -
Run
01_data_cleaning.ipynb -
Run
02_build_load_signature.ipynb -
Run
03_data_split.ipynb. For more information, please go to dataset/README -
Run
04_downsampling.ipynb
-
To train the PPO-MI models, open
rl_training_bashscript.sh. Comment line 5 and uncomment line 6. I.e.rl_training_bashscript.sh#!/bin/bash reward_lambda_array=("0" "0.5" "1") # reward_lambda="0.5" # action_type_array=("discrete") action_type_array=("continuous") n_episodes=800 seed=42
Then activate the virtual environment, and execute this script under the project root directory.
-
To train the DDQL-MI models, open
rl_training_bashscript.sh. Comment line 6 and uncomment line 5. I.e.rl_training_bashscript.sh#!/bin/bash reward_lambda_array=("0" "0.5" "1") # reward_lambda="0.5" action_type_array=("discrete") # action_type_array=("continuous") n_episodes=800 seed=42
Then activate the virtual environment, and execute this script under the project root directory.
-
To execute hyperparameter optimization script for PPO-MI, execute
rl_training_script_PPO_finetuning.pyfor 30 trials. This should take around 2 days on a single RTX4090. -
Training results can be analyzed using
expt_results_cross_models.ipynbexpt_results_cross_models_privacy_protection.ipynbexpt_results_cross_models_privacy_protection_logtest.ipynbexpt_results_cross_models_multi_episodes.ipynbexpt_results_cross_models_hnetwork.ipynb