Skip to content

INFERLab/IbexRL

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ibex-RL: Interpretable and Scalable Control via Physics-Informed Reinforcement Learning

IbexRL is an advanced control algorithm designed for optimizing HVAC systems in building environments. It leverages a core agent built on Differentiable Model Predictive Control (dMPC), enabling it to learn and adapt its control strategy online.

The framework follows a paradigm of Imitation Learning (IL) + Online Learning: A two-stage process where the agent is first pre-trained to mimic a baseline controller and then refined through online reinforcement learning.

Architecture of the IbexRL controller

🏛️ Repository Structure

Here are the key files and directories in this project:

  • IbexAgent.py: The core of the project. Contains the differentiable MPC agent logic.
  • online_learning_main.py: The main script to execute the online reinforcement learning experiments. This is the primary file you will run.
  • online_learning.py: Contains the main simulation loop that interacts with the BOPTEST environment. Modify this file to adapt to new environments.
  • train_imit.py: The script used to run the initial imitation learning phase from scratch.
  • Gnu-RL/: This directory contains the Gnu-RL implementation and the baseline data generated by the default BOPTEST controller, which is used for both imitation learning and as a source of disturbances during online learning.
  • imit_environment.yaml: Conda environment file for imitation learning.
  • online_environment.yaml: Conda environment file for online learning.

Core Dependencies & Acknowledgements

This project was developed using the following key packages. We extend our gratitude to their developers.

  • BOPTEST-GYM This is a simulation and testing framework for building performance assessments. The expert controller data used for imitation learning in this research was generated using a BOPTEST environment. For more details, please visit the official BOPTEST website. 🏢

  • diff-mpc This package is a fast and differentiable model predictive control (MPC) solver for PyTorch. The required library files are already included in the ./diff_mpc directory, so no separate installation is necessary. For more information on the solver, please see the original diff-mpc repository. ⚙️


⚙️ Setup and Installation

Before running any experiments, you must set up the appropriate Conda environment.

1. For Imitation Learning (Optional)

If you plan to run the imitation learning phase from scratch, create and activate the rl-imit environment:

# Create the environment from the .yaml file
conda env create -f imit_environment.yaml

# Activate the environment
conda activate rl-imit

2. For Online Learning

To run the main online learning experiments, create and activate the rl-online environment:

# Create the environment from the .yaml file
conda env create -f online_environment.yaml

# Activate the environment
conda activate rl-online

🚀 Usage

There are two ways to use this framework.

Path A: Using Pre-Trained Models

This is the quickest way to get started. It skips the imitation learning step and uses pre-computed model parameters to begin online learning directly.

  1. Activate the Environment: Make sure you are in the rl-online Conda environment.
    conda activate rl-online
    
  2. Configure the Main Script: Open online_learning_main.py. You will need to configure two sections:
    • File Paths: At the top of the file, update the following path to match your local machine's directory structure:
      path_for_your_repo = "/path/to/your/repo"
      
    • Hyperparameters: Inside the Args class, you can adjust various parameters for the experiment. See the Configuration section below for details on key parameters like learning rates (state_lr, action_lr), cost calibration (cost_calibration), and exploration (explore).
  3. Run the Experiment: Execute the script from your terminal. It will loop through the number of runs specified in the code.
    python online_learning_main.py
    

Path B: Running Everything from Scratch

Follow this path if you want to generate your own imitation learning model before starting the online learning phase.

  1. Run Imitation Learning:
    • Activate the rl-imit environment.
    • Run train_imit.py. This will train the agent with various hyperparameter configurations to mimic the baseline controller data located in the Gnu-RL folder and save the initial model parameters.
  2. Run Online Learning:
    • Follow all the steps in Path A.
    • Crucially, in the Args class within online_learning_main.py, make sure the IL_... parameters (IL_ETA_FOR_PATH, IL_FHAT_EPOCH, etc.) point to the results of your imitation learning run.

🔧 Configuration Details

All major hyperparameters are located in the Args class inside online_learning_main.py. Some of them were designed for the ablation study and thus, you do not need to change them for regular performance.

  • state_lr: Learning rate for the state-space model parameters (e.g., thermal resistance).
  • action_lr: Learning rate for the cost function parameters (O_hat, R_hat).
  • cost_calibration: Set to True if you want to learn the cost parameters online. This is necessary if you are using a custom, non-quadratic reward function.
  • IL_... parameters: These variables tell the script which pre-trained imitation learning model to load as the starting point for the online phase.
  • explore: Set to True to enable exploration by adding noise to the agent's actions. sigma_init controls the initial amount of noise.
  • folder_name: The name of the directory where the experiment results will be saved.

Reward Function:

We test using two different rewards. In utils.py:

  • R_func(): this is just the negative of the quadratic cost function. This is our reward when we are NOT doing cost calibration.
  • non_quadratic_cumulative_reward(): This is the non quadratic reward we use for cost calibration. Basically, O_occ and R values are updated to maximize this reward so the quadratic cost is approximated in a way that maximizes the non-quadratic cost. One should change them to test different rewards.

Data Sources

The training data is derived from two primary files:

observations.csv: Contains the 75 days of observation data sampled at 30-minute intervals from a baseline BOPTEST controller. results_tests_test_constant_original/results_sim_0.csv: Provides higher-granularity simulation data from the same controller. The action data is extracted from this file and resampled to 30-minute intervals by the agent/train_imit.py script.


🌍 Adapting to a New Environment

To apply this agent to a different BOPTEST environment, you need to modify online_learning2.py.

  1. Open online_learning.py.
  2. Navigate to the test_loop function.
  3. Update the following variable definitions to match the observation and control signals of your new environment:
    • state_name
    • dist_name
    • ctrl_name
    • target_name
    • The boptest_obs_config_for_env dictionary, which defines the names and ranges of all variables to be pulled from the environment.
  4. Update the physics informed model structure in IbexAgent.py. This is dependent on your inputs such as state, disturbances and control variables. Thus, it needs to be changed for a new environment.

About

Implementation of IbexRL

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 63.5%
  • Python 36.1%
  • Other 0.4%