Docker image for configuring, building, and running Synthea to generate test clinical data
SyntheaTM is a Synthetic Patient Population Simulator.
The goal is to output synthetic, realistic (but not real),
patient data and associated health records in a variety of formats.
Synthea Source code: https://github.com/synthetichealth/synthea
Synthea Wiki: https://github.com/synthetichealth/synthea/wiki
Also includes SQL Definitions for several dialects: ./synthea-database/README.md
(For windows, run the .ps1 file with the same name.)
- optionally configure environment variables (optional because defaults are set in
env-setup.sh)
# copy .env.sample to .env, then make your changes to .env
cp .env.sample .env# .env.sample
REPO_URL=https://github.com/synthetichealth/synthea.git
SYNTHEA_BRANCH=v3.2.0
IMAGE_NAME=synthea-docker-v3.2.0
MINAGE=18
MAXAGE=64-
configure
include/synthea.config -
add custom modules to
include/modulesand custom resources toinclude/resources -
build image:
# on linux, you may need to run with sudo if your user is not in the 'docker' group
# on macos (docker desktop), sudo is typically not required
sh ./build.sh- run the image to generate patient data (writes to
output/folder):
sh ./run.sh- for debugging, drop into a shell of the image
sh ./shell.sh- optionally remove containers and image
sh ./rm-image.sh-
Dockerfile- Compiles synthea java source code into an image namedsynthea-build- copies
include/files to their appropriate location - include a runtime environment with some shell scripts
- copies
-
Environment variable management:
.env.sample: sample.envfile to use as a basis for your own.envenv-setup.sh: set environment variables from .env, for use in shell scripts- defines fallback values if needed
- automatically sourced at the top of each shell script
-
Shell scripts for building and running (each has a
.ps1analog):build.sh: create a docker image with everything needed to generate clinical data- includes resources and configuration files in the
includedirectory
- includes resources and configuration files in the
run.sh: run the image, with a volume attached to a local/outputfolderdocker run -it -v ${PWD}\\output:/synthea/output $IMAGE_NAME- runs
./generate_data- shell script to generate data
java -jar synthea-with-dependencies.jar -c synthea.config -a $MINAGE-$MAXAGE
shell.sh: for debugging, enter a shell in a running containerrm-image.sh: for cleanup, remove existing containers and image
Configuration files and folders, automatically copied into the build
include/modules/- custom modulesinclude/modules/testmodule.json- example of a custom module
include/resources/- custom resourcesinclude/resources/names.yml- language and gender-specific lists of names to use as patient given names
include/output/csv- example (empty) output csv files with headersinclude/generate_data- main entry point when you run the imageinclude/synthea.config- configuration items - see synthea's wiki for details: Common Configuration- NOTE: age is not a setting that can be set in
synthea.config- instead, pass
-a min-maxas a parameter to the java executable
- instead, pass
- NOTE: age is not a setting that can be set in
include/synthea.properties- all synthea properties
DDL for SQL tables that match the structure of synthea output csvs: ./synthea-database/README.md
- er-diagrams
- mssql
- postgresql
- scripts
The main entry point is include/generate_data. this gets copied to /synthea/synthea, which also contains
run_synthea, which is provided by synthea and runs the following:
java -jar synthea-with-dependencies.jar [-h]
[-s seed]
[-r referenceDate as YYYYMMDD]
[-cs clinician seed]
[-p populationSize]
[-g gender]
[-a minAge-maxAge]
[-c localConfigFilePath]
[-d localModulesDirPath]
[state [city]]