IbexRL is an advanced control algorithm designed for optimizing HVAC systems in building environments. It leverages a core agent built on Differentiable Model Predictive Control (dMPC), enabling it to learn and adapt its control strategy online.
The framework follows a paradigm of Imitation Learning (IL) + Online Learning: A two-stage process where the agent is first pre-trained to mimic a baseline controller and then refined through online reinforcement learning.
Here are the key files and directories in this project:
IbexAgent.py: The core of the project. Contains the differentiable MPC agent logic.online_learning_main.py: The main script to execute the online reinforcement learning experiments. This is the primary file you will run.online_learning.py: Contains the main simulation loop that interacts with the BOPTEST environment. Modify this file to adapt to new environments.train_imit.py: The script used to run the initial imitation learning phase from scratch.Gnu-RL/: This directory contains the Gnu-RL implementation and the baseline data generated by the default BOPTEST controller, which is used for both imitation learning and as a source of disturbances during online learning.imit_environment.yaml: Conda environment file for imitation learning.online_environment.yaml: Conda environment file for online learning.
This project was developed using the following key packages. We extend our gratitude to their developers.
-
BOPTEST-GYM This is a simulation and testing framework for building performance assessments. The expert controller data used for imitation learning in this research was generated using a BOPTEST environment. For more details, please visit the official BOPTEST website. 🏢
-
diff-mpc This package is a fast and differentiable model predictive control (MPC) solver for PyTorch. The required library files are already included in the
./diff_mpcdirectory, so no separate installation is necessary. For more information on the solver, please see the original diff-mpc repository. ⚙️
Before running any experiments, you must set up the appropriate Conda environment.
If you plan to run the imitation learning phase from scratch, create and activate the rl-imit environment:
# Create the environment from the .yaml file
conda env create -f imit_environment.yaml
# Activate the environment
conda activate rl-imitTo run the main online learning experiments, create and activate the rl-online environment:
# Create the environment from the .yaml file
conda env create -f online_environment.yaml
# Activate the environment
conda activate rl-onlineThere are two ways to use this framework.
This is the quickest way to get started. It skips the imitation learning step and uses pre-computed model parameters to begin online learning directly.
- Activate the Environment: Make sure you are in the
rl-onlineConda environment.conda activate rl-online - Configure the Main Script: Open
online_learning_main.py. You will need to configure two sections:- File Paths: At the top of the file, update the following path to match your local machine's directory structure:
path_for_your_repo = "/path/to/your/repo" - Hyperparameters: Inside the
Argsclass, you can adjust various parameters for the experiment. See the Configuration section below for details on key parameters like learning rates (state_lr,action_lr), cost calibration (cost_calibration), and exploration (explore).
- File Paths: At the top of the file, update the following path to match your local machine's directory structure:
- Run the Experiment: Execute the script from your terminal. It will loop through the number of runs specified in the code.
python online_learning_main.py
Follow this path if you want to generate your own imitation learning model before starting the online learning phase.
- Run Imitation Learning:
- Activate the
rl-imitenvironment. - Run
train_imit.py. This will train the agent with various hyperparameter configurations to mimic the baseline controller data located in theGnu-RLfolder and save the initial model parameters.
- Activate the
- Run Online Learning:
- Follow all the steps in Path A.
- Crucially, in the
Argsclass withinonline_learning_main.py, make sure theIL_...parameters (IL_ETA_FOR_PATH,IL_FHAT_EPOCH, etc.) point to the results of your imitation learning run.
All major hyperparameters are located in the Args class inside online_learning_main.py. Some of them were designed for the ablation study and thus, you do not need to change them for regular performance.
state_lr: Learning rate for the state-space model parameters (e.g., thermal resistance).action_lr: Learning rate for the cost function parameters (O_hat,R_hat).cost_calibration: Set toTrueif you want to learn the cost parameters online. This is necessary if you are using a custom, non-quadratic reward function.IL_...parameters: These variables tell the script which pre-trained imitation learning model to load as the starting point for the online phase.explore: Set toTrueto enable exploration by adding noise to the agent's actions.sigma_initcontrols the initial amount of noise.folder_name: The name of the directory where the experiment results will be saved.
We test using two different rewards. In utils.py:
- R_func(): this is just the negative of the quadratic cost function. This is our reward when we are NOT doing cost calibration.
- non_quadratic_cumulative_reward(): This is the non quadratic reward we use for cost calibration. Basically, O_occ and R values are updated to maximize this reward so the quadratic cost is approximated in a way that maximizes the non-quadratic cost. One should change them to test different rewards.
The training data is derived from two primary files:
observations.csv: Contains the 75 days of observation data sampled at 30-minute intervals from a baseline BOPTEST controller. results_tests_test_constant_original/results_sim_0.csv: Provides higher-granularity simulation data from the same controller. The action data is extracted from this file and resampled to 30-minute intervals by the agent/train_imit.py script.
To apply this agent to a different BOPTEST environment, you need to modify online_learning2.py.
- Open
online_learning.py. - Navigate to the
test_loopfunction. - Update the following variable definitions to match the observation and control signals of your new environment:
state_namedist_namectrl_nametarget_name- The
boptest_obs_config_for_envdictionary, which defines the names and ranges of all variables to be pulled from the environment.
- Update the physics informed model structure in
IbexAgent.py. This is dependent on your inputs such as state, disturbances and control variables. Thus, it needs to be changed for a new environment.
