A library for generating synthetic electronic health records in FHIR v4 format using agent-based modeling to simulate patient pathways.
See the redacted final report from the March 2021 project development for an overview of the main components as well as suggested future developments - "REDACTED_C245 ABM Patient Pathways_Final Report_V3_28042021.cleaned.pdf"
The simulation models a single patient interacting with environments (hospitals, GPs, etc) which can prompt updates to the patient's record.
Patients and Environments are modelled as agents. They are python class objects
of type PatientAgent and EnvironmentAgent respectively, and are located
in
src/patient_abm/agent
The simulation is configured by a configuration script, and the details of
the patient-environment interactions must be implemented in the intelligence
layer (see the relevant sections below for more details). We have provided
templates for these elements.
- The main code is found in the
srcandtemplatefolders of the repository (see Usage below for more information) - The accompanying report is also available in the
reportsfolder
This repository has been tested using and
Use your terminal to cd into the directory containing this README
(the project root directory) and run:
pip install .
Alternatively, if you want to develop and edit the library, then run
pip install -e ".[dev]"
In the project root directory, run:
export PATIENT_ABM_DIR="$(pwd)"
After installing patient_abm, to run a patient pathway simulation, you must:
(1) Set up the simulation configuration script config.json
(2) Implement the intelligence layer, which:
-
governs how the Patient and Environment agents interact;
-
generates new Patient record entries;
-
decides which Environment the patient should visit next and at what time;
-
optionally applies custom updates the Patient and Environment agents.
In the folder template we provide templates for the config.json and the
intelligence layer. The subfolder, also called template, contains empty
template files, whereas the subfolder example contains example files.
You config.json and intelligence_dir can be located anywhere - they do not
need to be inside this repo.
After completing this, in the terminal, run:
patient_abm simulation run --config_path </path/to/config.json>
to run the simulation. Angular brackets <...> here and in the following
indicate places where the user needs to supply their own values,
or where values are automatically generated by the simulation. For instance,
if you want to run the config.json in template/example this this is
the command
patient_abm simulation run --config_path template/example/config.json
Its outputs can be found in template/example/outputs.
This will load and validate the config.json, load the variables from the
config, and then run the simulation one patient at a time, saving the outputs
after each simulation.
The following folder structure and outputs are created in the save_dir
defined in the config.json:
<simulation_id> /
agents /
patient_<patient_id>.tar
environment_<environment_0_id>.tar
environment_<environment_1_id>.tar
...
fhir /
bundle.json
main.log
patient.log
where a unique simulation_id is automatically generated for every patient
in the config.json.
The configuration file config.json contains all the information needed to
initialize:
- All the simulation Patients
- All the simulation Environments (each Patient's 'universe'), along
with the names of the interactions that the
intelligencelayer can apply when the Patient is present at an Environment - Path to the
intelligencelayer directory,intelligence_dir - Path to the
save_dirdirectory in which the simulation outputs will be written - Any other simulation parameter, such as stopping conditions, logging frequency, etc.
The config.json is a file with key-value pairs:
{
key_0: <value_0>,
key_1: <value_1>,
...
}
Below we provide the definition for each key and what the user is expected to provide as the corresponding value
The key patients refers to data that should be used to initialize patient
agent objects. You can enter its value in one of two ways:
- Write the patient data directly as a list of dictionaries. Each dictionary contains the patient class initialization arguments as key-value pairs.
- Give a path to a JSON (strongly preferred) or a CSV file that contains the
same data as the list of dictionaries. The reason a JSON is preferred is
because correct the datatypes are preserved, and is particularly important
in the case where the Patient attribute is a nested object (such as the
Patient's
conditionsattribute.)
Note that each patient must have the following required attributes:
- patient_id : Union[str, int]: Unique ID for the patient.
- gender : str: Patient gender, either "male" or "female".
There are many other optional attributes, see the documentation for
the
PatientAgentclass inpatient_abm.agent.patient.
Two patient can have the same patient_id.
Even though multiple patients can be listed here, the simulation only runs for one patient at a time, they do not interact.
The key environments refers to data that should be used to initialize
Environment objects. You can enter its value in one of two ways:
- Write the environment data directly as a list of dictionaries. Each dictionary contains the patient class initialization arguments as key-value pairs.
- Give a path to a JSON (strongly preferred) or a CSV file that contains the
same data as the list of dictionaries. The reason a JSON is preferred is
because correct the datatypes are preserved, and is particularly important
in the case where the Environment attribute is a nested object (such as the
Environment
interactionsattribute.)
Note that each environment must have the following required attribute:
- environment_id : Union[str, int]: Unique ID for the environment.
There are many other optional attributes, see the documentation for
the
EnvironmentAgentclass inpatient_abm.agent.environment.
Each environment in the list must have a unique environment_id.
Each environment's
interactions attribute is a list of strings referring to functions in the
intelligence layer with a specific structure. For example, if the
intelligence layer directory looks like
<intelligence_dir> /
interactions /
general.py
gp.py
intelligence.py
and there are functions in general.py called inpatient_encounter and
outpatient_encounter, and two functions in gp.py called measure_bmi
and diagnose_fever, then suppose we had a GP environments, its interactions
list might be
interactions = [
"general.inpatient_encounter",
"gp.measure_bmi",
"gp.diagnose_fever"
]
Note that default interactions located in
src/patient_abm/intelligence/interactions/default
get added to every environment as well. These are currently automatically
added but in future could be amended.
The key intelligence_dir refers to the directory that contains the
intelligence layer. Its value is the path string to that directory.
The key save_dir refers to the directory in which the simulation outputs
should be saved. Its value is the path string to that directory.
The key initial_environment_ids refers to the initial Environment that each
patient should visit, given by the Environment's environment_id. Its value
is a dictionary, which can take several formats:
{from_id: <environment_id>}, all Patients will start from the Environment with that<environment_id>.{from_id: [<environment_id_0>, <environment_id_1>, ...]}, the list of environment IDs must be as long as the number of Patients, each Patient will start from the Environment given in the corresponding position in the list.{from_probability: [<p_0>, <p_1>,...]}, the list of probabilitiesp_imust be as long as the number of Environments. The distribution is sampled for each patient.{from_probability: [<p_0>, <p_1>,...]}, the list of probabilitiesp_imust be as long as the number of Environments. The distribution is sampled for each patient.{from_json: '</path/to/ids.json>'}, a JSON file containing initial environment IDs, one for each patient.
The key stopping_condition refers to the condition that should cause
the simulation while loop to terminate. The simulation can always terminate
early if a death interaction is applied.
Its value is a dictionary, which can take several formats:
{max_num_steps: <max_num_steps>}, the maximum number of steps (an integer) in the simulation.{max_real_time: {<unit>: <value>}}, maximum real time the simulation should run for. The subdictionary is{<unit>: <value>}is passed into python'sdatetime.timedeltafunction and so should respect the parameter values there.{max_patient_time: {<unit>: <value>}}, maximum patient time the simulation should run for. The subdictionary is{<unit>: <value>}is passed into python'sdatetime.timedeltafunction and so should respect the parameter values there.
The key hard_stop refers to a hard upper bound on the number of simulation
steps. It is there to try and prevent the loop going to infinity for any
reason. An integer value is expected.
The key log_every refers to the number of simulation steps that should
execute between logging. Its value is an integer, i.e.. if it is n then
logging will happen every n-th simulation step.
If log_every > 1, then logging of simulation information between log_every
steps may be lost. log_intermediate is a boolean which, if set to true,
will ensure intermediate log information is collected but then actually writes
to the logger in the log_every step. If false, only the information at
every log_every-th simulation steps is written.
A boolean value which, if set to true add the latest patient record entry
to the logger. This should be used mainly for debugging. The full patient
record is always stored in the saved patient agent tar file, and it can be
recovered from there.
At the end of the simulation, the patient record is converted into a FHIR
Bundle resource and validated. Validation can be done using an "offline" method
via the python fhir.resources library, or "online" by sending the bundle
to the HAPI FHIR server (http://hapi.fhir.org/baseR4). If
fhir_server_validate is true, the online method used.
When new patient entries are added to the patient record, a validation step
is performed which checks whether the entry already exists, this is to prevent
duplication. patient_record_duplicate_action decides the action to take if a
duplicate is found. If it is set to "add", the new entry is added, whereas if
if it is set to "skip" the entry won't be added.
As an illustration for how a breast cancer pathway might look, we have provided a config.json for this in template/breast_cancer. This is simply an initial version of how this script could be configured for such a pathway, but this template and the intelligence layer can be configured to facilitate more complex dynamics.
The intelligence layer is a directory of python scripts. The location of the
directory is given by the field intelligence_dir in the config.json. The
structured of intelligence_dir is as follows:
<intelligence_dir> /
interactions /
<interactions_0>.py
<interactions_1>.py
...
intelligence.py
The intelligence.py script must contain a function called intelligence.
More information about the intelligence layer and how it should be
structured are provided inside the respective files in
template/template/<intelligence_dir>.
nox is used to check code is correctly formatted and runs the test suite.
To use nox, cd the project root directory and run:
nox
Tests can also be run from this directory via
pytest tests
There are two demo notebooks in the notebooks folder.
In this notebook we introduce the patient agent and its methods including:
- initializing with comorbidities
- adding properties to conditions, such as severity
- updating the patient record
- the patient record internal representation and converting to FHIR
In this notebook we walk through how to run a simulation with a very simple intelligence layer and interactions. Please see above for more information about the simulation configuration script and the intelligence layer (we will not go into detail about the intelligence layer in the notebook). Here we will be using the files in template/example, and going through main processes that are called when patient_abm.simulation.run.simulate is executed (which is the function called by the CLI command patient_abm simulation run)
See the open issues for a list of proposed features (and known issues).
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
See CONTRIBUTING.md for detailed guidance.
Distributed under the MIT License. See LICENSE for more information.
To find out more about the Analytics Unit visit our project website or get in touch at england.tdau@nhs.net.