Atlas is a high-performance, open-source ETL tool built in Rust that bridges openEHR clinical data repositories with modern analytics platforms. It enables healthcare organizations to seamlessly export openEHR compositions to Azure Cosmos DB or PostgreSQL for advanced analytics, machine learning, and research.
Atlas solves the challenge of making openEHR clinical data accessible for modern analytics workflows. By exporting compositions from openEHR servers (EHRBase, Better Platform) to your choice of database backend (Azure Cosmos DB or PostgreSQL), Atlas enables:
- Clinical Research: Query patient data using familiar SQL instead of AQL
- Machine Learning: Build ML models on flattened, analytics-ready data
- Operational Analytics: Power dashboards and reports with Azure-native tools
- Regulatory Reporting: Maintain audit trails with data verification
- Data Integration: Connect openEHR data to Azure Synapse, Databricks, and Power BI
-
🚀 High Performance: Built with Rust for async/concurrent processing
- Batch processing with configurable sizes (100-5000 compositions)
- Parallel EHR processing (1-100 concurrent EHRs)
- Throughput: 1000-2000 compositions/minute
-
🔄 Incremental Sync: Smart state management with watermarks
- Track last export per {template_id, ehr_id} combination
- Export only new/changed data since last run
- Automatic checkpoint and resume from failures
-
🎨 Flexible Transformation: Multiple composition formats
- Preserve Mode: Maintain exact FLAT JSON structure from openEHR
- Flatten Mode: Convert nested paths to flat field names for ML/analytics
-
⚙️ Easy Configuration: TOML-based with environment variable support
- Simple, human-readable configuration files
- Secure credential management with env vars
- Comprehensive validation and error messages
-
🛡️ Reliable & Resilient: Production-ready error handling
- Automatic retry with exponential backoff
- Partial batch failure handling
- Duplicate detection and skipping
- Graceful shutdown with SIGTERM/SIGINT handling
- Automatic checkpoint on interruption for safe resume
-
📊 Database Flexibility: Multiple backend options
- Azure Cosmos DB: Core (SQL) API with automatic partitioning
- PostgreSQL: 14+ with JSONB support for flexible querying
- Azure Log Analytics integration (Logs Ingestion API)
- Kubernetes/AKS deployment support
-
🔒 Privacy & Compliance: Built-in anonymization
- Automated PII Detection: Regex-based detection of 24+ PII categories
- HIPAA Safe Harbor: 18 identifiers per 45 CFR §164.514(b)(2)
- GDPR Compliance: HIPAA identifiers + GDPR quasi-identifiers
- Flexible Strategies: Redaction or tokenization
- Dry-Run Mode: Preview PII detection without modifying data
- Audit Logging: SHA-256 hashed values, comprehensive tracking
- Zero Performance Impact: <100ms overhead, <15% throughput impact
- Vendor Abstraction: Trait-based design supports multiple openEHR vendors (EHRBase, Better Platform)
- Type Safety: Strongly-typed domain models with Rust's type system
- Observability: Structured logging with tracing, Azure integration
- Security: TLS 1.2+, credential management, least-privilege access
- Compliance: HIPAA-ready, GDPR-ready, audit logging, data verification
- Rust 1.70+ (for building from source)
- openEHR Server (choose one):
- EHRBase: Version 0.30+ with REST API v1.1.x
- Better Platform: Sandbox or production environment with OIDC authentication
- Database Backend (choose one):
- Azure Cosmos DB: Core (SQL) API account with database created
- PostgreSQL: Version 14+ with database created
- Network Access: Outbound HTTPS to openEHR server and database
# Download latest release
wget https://github.com/erikhoward/atlas/releases/download/v2.4.0/atlas-linux-x86_64.tar.gz
# Extract and install
tar -xzf atlas-linux-x86_64.tar.gz
sudo mv atlas /usr/local/bin/
sudo chmod +x /usr/local/bin/atlas
# Verify installation
atlas --version# Install Rust (if not already installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Clone the repository
git clone https://github.com/erikhoward/atlas.git
cd atlas
# Build release binary
cargo build --release
# Install binary
sudo cp target/release/atlas /usr/local/bin/
# Verify installation
atlas --version# Pull the latest Docker image
docker pull erikhoward/atlas:latest
# Run Atlas with configuration file
docker run --rm \
-v $(pwd)/atlas.toml:/app/config/atlas.toml \
-e ATLAS_OPENEHR_USERNAME=your_username \
-e ATLAS_OPENEHR_PASSWORD=your_password \
-e ATLAS_COSMOSDB_KEY=your_cosmos_key \
erikhoward/atlas:latest \
export --config /app/config/atlas.toml
# Or use docker-compose (see docker-compose.yml example)
docker-compose upDocker Benefits:
- ✅ No Rust installation required
- ✅ Consistent environment across deployments
- ✅ Easy integration with Kubernetes/AKS
- ✅ Multi-platform support (amd64, arm64)
See Docker Setup Guide for detailed instructions.
# Generate sample configuration with examples
atlas init --with-examples --output atlas.toml
# Edit configuration for your environment
vi atlas.toml
# Option 1: Use .env file (recommended for development)
# Create a .env file in the project root with your credentials
cat > .env << EOF
ATLAS_OPENEHR_USERNAME=your-openehr-username
ATLAS_OPENEHR_PASSWORD=your-openehr-password
ATLAS_PG_PASSWORD=your-postgres-password
EOF
# The .env file is automatically loaded when Atlas starts
# Option 2: Set environment variables manually
export ATLAS_OPENEHR_USERNAME="your-openehr-username"
export ATLAS_OPENEHR_PASSWORD="your-openehr-password"
# For CosmosDB
export ATLAS_COSMOSDB_KEY="your-cosmos-db-key"
# For PostgreSQL
export ATLAS_PG_PASSWORD="your-postgres-password"
# Validate configuration
atlas validate-config -c atlas.tomlMinimal Configuration Example (CosmosDB):
[openehr]
base_url = "https://your-ehrbase-server.com/ehrbase"
username = "${ATLAS_OPENEHR_USERNAME}"
password = "${ATLAS_OPENEHR_PASSWORD}"
tls_verify = true
[openehr.query]
template_ids = ["IDCR - Vital Signs.v1"]
[export]
mode = "incremental"
export_composition_format = "preserve"
database_target = "cosmosdb"
[cosmosdb]
endpoint = "https://your-account.documents.azure.com:443/"
key = "${ATLAS_COSMOSDB_KEY}"
database_name = "openehr_data"Minimal Configuration Example (PostgreSQL):
[openehr]
base_url = "https://your-ehrbase-server.com/ehrbase"
username = "${ATLAS_OPENEHR_USERNAME}"
password = "${ATLAS_OPENEHR_PASSWORD}"
tls_verify = true
[openehr.query]
template_ids = ["IDCR - Vital Signs.v1"]
[export]
mode = "incremental"
export_composition_format = "preserve"
database_target = "postgresql"
[postgresql]
connection_string = "postgresql://atlas_user:${ATLAS_PG_PASSWORD}@localhost:5432/openehr_data?sslmode=require"
max_connections = 20See examples/atlas.example.toml for CosmosDB configuration and examples/atlas.postgresql.example.toml for PostgreSQL configuration.
Atlas supports comprehensive environment variable overrides for all configuration options, enabling containerized deployments and 12-factor app compliance:
# Override any configuration value using ATLAS_<SECTION>_<KEY> pattern
export ATLAS_DATABASE_TARGET=postgresql
export ATLAS_APPLICATION_LOG_LEVEL=debug
export ATLAS_OPENEHR_BASE_URL=https://prod-ehrbase.com
export ATLAS_OPENEHR_USERNAME=atlas_prod
export ATLAS_OPENEHR_PASSWORD=secret
export ATLAS_OPENEHR_QUERY_BATCH_SIZE=2000
export ATLAS_EXPORT_MODE=incremental
export ATLAS_POSTGRESQL_CONNECTION_STRING="postgresql://user:pass@postgres:5432/db"
export ATLAS_POSTGRESQL_MAX_CONNECTIONS=20
# Arrays support JSON or comma-separated format
export ATLAS_OPENEHR_QUERY_TEMPLATE_IDS='["IDCR - Vital Signs.v1","IDCR - Lab Report.v1"]'
export ATLAS_OPENEHR_QUERY_EHR_IDS="ehr-123,ehr-456,ehr-789"
# Run with minimal TOML file (or even no TOML file with all env vars set)
atlas export -c minimal.tomlDocker Example:
docker run -d \
-e ATLAS_DATABASE_TARGET=postgresql \
-e ATLAS_OPENEHR_BASE_URL=https://ehrbase.example.com \
-e ATLAS_OPENEHR_USERNAME=atlas \
-e ATLAS_OPENEHR_PASSWORD="${OPENEHR_PASSWORD}" \
-e ATLAS_OPENEHR_QUERY_TEMPLATE_IDS='["IDCR - Vital Signs.v1"]' \
-e ATLAS_POSTGRESQL_CONNECTION_STRING="${PG_CONNECTION_STRING}" \
-e ATLAS_EXPORT_MODE=incremental \
atlas:latestKubernetes Example:
apiVersion: v1
kind: ConfigMap
metadata:
name: atlas-config
data:
ATLAS_DATABASE_TARGET: "postgresql"
ATLAS_OPENEHR_BASE_URL: "https://ehrbase.example.com"
ATLAS_OPENEHR_QUERY_TEMPLATE_IDS: '["IDCR - Vital Signs.v1"]'
ATLAS_EXPORT_MODE: "incremental"
---
apiVersion: v1
kind: Secret
metadata:
name: atlas-secrets
type: Opaque
stringData:
ATLAS_OPENEHR_PASSWORD: "secret"
ATLAS_POSTGRESQL_CONNECTION_STRING: "postgresql://user:pass@postgres:5432/db"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: atlas
spec:
template:
spec:
containers:
- name: atlas
image: atlas:latest
envFrom:
- configMapRef:
name: atlas-config
- secretRef:
name: atlas-secretsSee Configuration Guide for complete list of supported environment variables.
# Run export
atlas export -c atlas.toml
# Dry run to preview (no data written)
atlas export -c atlas.toml --dry-run
# Check export status and watermarks
atlas status -c atlas.toml
# Override configuration options
atlas export -c atlas.toml --mode full --template-id "Your Template.v1"Atlas supports graceful shutdown for long-running exports, ensuring data integrity and allowing safe resumption:
# Start an export
atlas export -c atlas.toml
# Press Ctrl+C or send SIGTERM to gracefully stop
# Atlas will:
# 1. Complete the current batch being processed
# 2. Save watermark state to database
# 3. Display progress summary
# 4. Exit with code 130 (SIGINT) or 143 (SIGTERM)
# Resume from where it left off
atlas export -c atlas.tomlKey Features:
- ✅ Safe Interruption: Current batch completes before shutdown (no partial data)
- ✅ Automatic Checkpoint: Watermarks saved with
Interruptedstatus - ✅ Resume Support: Re-run the same command to continue from checkpoint
- ✅ Configurable Timeout: Default 30s grace period (configurable via
export.shutdown_timeout_secs) - ✅ Container-Ready: Works with Docker stop, Kubernetes pod termination, systemd
Configuration:
[export]
# Graceful shutdown timeout in seconds (default: 30)
# Should align with container orchestration grace periods
shutdown_timeout_secs = 30Exit Codes:
0- Export completed successfully1- Partial success (some exports failed)130- Interrupted by SIGINT (Ctrl+C)143- Interrupted by SIGTERM (graceful termination signal)- Other codes indicate configuration, authentication, or connection errors
See the examples/ directory for complete configurations:
- Clinical Research: Full export with data verification
- Daily Sync: Incremental sync for production
- ML Features: Flattened data for machine learning
Atlas includes built-in anonymization capabilities to protect PHI/PII when exporting openEHR compositions, helping organizations comply with HIPAA and GDPR regulations.
Add anonymization configuration to your atlas.toml:
[anonymization]
enabled = true
mode = "hipaa_safe_harbor" # or "gdpr"
strategy = "token" # or "redact"
dry_run = false
[anonymization.audit]
enabled = true
log_path = "./audit/anonymization.log"
json_format = trueRun export with anonymization:
# Enable anonymization
atlas export --anonymize
# Override compliance mode
atlas export --anonymize --anonymize-mode gdpr
# Dry-run to preview PII detection
atlas export --anonymize --anonymize-dry-run- Automated PII Detection: Regex-based detection of 24+ PII categories
- HIPAA Safe Harbor: 18 identifiers per 45 CFR §164.514(b)(2)
- GDPR Compliance: HIPAA identifiers + 6 GDPR quasi-identifiers
- Flexible Strategies:
- Token: Replace with unique random tokens (e.g.,
TOKEN_NAME_a1b2c3d4) - Redact: Replace with category markers (e.g.,
[REDACTED_NAME])
- Token: Replace with unique random tokens (e.g.,
- Dry-Run Mode: Preview PII detection without modifying data
- Audit Logging: SHA-256 hashed values, comprehensive tracking
- Performance: <100ms overhead per composition, <15% throughput impact
HIPAA Safe Harbor (hipaa_safe_harbor):
- Detects 18 identifiers specified in 45 CFR §164.514(b)(2)
- Suitable for US healthcare organizations
GDPR (gdpr):
- Detects all HIPAA identifiers + 6 GDPR quasi-identifiers
- Suitable for European organizations or multi-region deployments
For complete anonymization documentation, see:
- Anonymization User Guide - Comprehensive usage guide
Atlas provides official Docker images for easy deployment and integration with container orchestration platforms.
# Pull the latest image
docker pull erikhoward/atlas:latest
# Run with configuration file and environment variables
docker run --rm \
-v $(pwd)/atlas.toml:/app/config/atlas.toml \
-v $(pwd)/logs:/app/logs \
-e ATLAS_OPENEHR_USERNAME=${OPENEHR_USER} \
-e ATLAS_OPENEHR_PASSWORD=${OPENEHR_PASS} \
-e ATLAS_COSMOSDB_KEY=${COSMOS_KEY} \
erikhoward/atlas:latest \
export --config /app/config/atlas.tomlCreate a docker-compose.yml file:
version: '3.8'
services:
atlas:
image: erikhoward/atlas:latest
volumes:
- ./atlas.toml:/app/config/atlas.toml
- ./logs:/app/logs
environment:
- ATLAS_OPENEHR_USERNAME=${OPENEHR_USER}
- ATLAS_OPENEHR_PASSWORD=${OPENEHR_PASS}
- ATLAS_COSMOSDB_KEY=${COSMOS_KEY}
- RUST_LOG=info
command: export --config /app/config/atlas.tomlRun with:
docker-compose uplatest- Latest stable release from main branch2.4.0,2.3,2- Semantic version tagsmain-<sha>- Specific commit from main branch
Images are built for multiple architectures:
linux/amd64- Standard x86_64 serverslinux/arm64- ARM64 (Apple Silicon, AWS Graviton, etc.)
# Build locally
docker build -t atlas:custom .
# Build for specific platform
docker build --platform linux/amd64 -t atlas:custom .For detailed Docker setup, configuration, and troubleshooting, see the Docker Setup Guide.
- User Guide - Complete usage instructions, troubleshooting, and best practices
- Configuration Guide - Detailed configuration reference with all options
- Example Configurations - Ready-to-use configs for common scenarios
- Architecture Documentation - System design, components, and data flow
- Developer Guide - Development setup and contribution guidelines
- Standalone Deployment - Binary deployment on Linux/macOS/Windows
- Docker Deployment - Containerized deployment
- Kubernetes Deployment - AKS and Kubernetes deployment
Atlas follows a layered architecture with clear separation of concerns:
┌─────────────────────────────────────────────────────────────────────────┐
│ Atlas CLI │
│ (Rust Binary) │
└──────────────┬──────────────────────────────────────┬───────────────────┘
│ │
│ REST API v1.1 │ Database Adapters
│ │
▼ ▼
┌──────────────────────────┐ ┌──────────────────────────────────────────┐
│ openEHR Server │ │ Database Backends │
│ (EHRBase 0.30+) │ │ │
│ │ │ ┌────────────────────────────────────┐ │
│ ┌────────────────────┐ │ │ │ Azure Cosmos DB (NoSQL) │ │
│ │ Compositions │ │ │ │ - Control Container (watermarks) │ │
│ │ (FLAT JSON) │ │ │ │ - Data Containers (per template) │ │
│ └────────────────────┘ │ │ │ - Partitioned by /ehr_id │ │
│ │ │ └────────────────────────────────────┘ │
└──────────────────────────┘ │ │
│ ┌────────────────────────────────────┐ │
│ │ PostgreSQL 14+ (Relational) │ │
│ │ - atlas_watermarks table │ │
│ │ - compositions_* tables │ │
│ │ - JSONB columns for flexibility │ │
│ └────────────────────────────────────┘ │
└──────────────────────────────────────────┘
Key Components:
- CLI Layer: Command-line interface with clap
- Core Layer: Business logic (export, transform, state, verification)
- Adapter Layer: External integrations (openEHR, Cosmos DB, PostgreSQL)
- Domain Layer: Core types and models
See Architecture Documentation for details.
Export patient cohorts for research studies while preserving exact data structures for regulatory compliance.
Flatten compositions into analytics-ready format for training predictive models on clinical data.
Power real-time dashboards and reports by syncing openEHR data to Cosmos DB daily.
Connect openEHR data to Azure Synapse Analytics, Databricks, or Power BI for advanced analytics.
Maintain comprehensive audit trails and logging for compliance requirements.
Atlas supports extensive configuration options:
| Category | Options | Description |
|---|---|---|
| Export Mode | full, incremental |
Full export or incremental sync |
| Format | preserve, flatten |
Maintain structure or flatten for analytics |
| Batch Size | 100-5000 | Compositions per batch |
| Parallelism | 1-100 EHRs | Concurrent EHR processing |
| Logging | Local, Azure Log Analytics | Structured logging options |
See Configuration Guide for complete reference.
Typical Performance (depends on composition size and network):
- Throughput: 1000-2000 compositions/minute
- Memory: 2-4 GB RAM (configurable with batch size)
- Cosmos DB: ~10 RU per composition write
Example Scenarios:
- Daily Sync: 1,000 compositions in ~1-2 minutes
- Research Export: 50,000 compositions in ~50-100 minutes
- ML Dataset: 500,000 compositions in ~4-8 hours
Atlas implements comprehensive security measures to protect sensitive healthcare data and credentials:
- Memory Security: All credentials (passwords, keys, secrets) are automatically zeroized in memory when no longer needed
- No Credential Logging: Credentials are never written to log files or exposed in debug output
- Redacted Debug Output: Debug representations show
Secret([REDACTED])instead of actual values - Environment Variables: Secure credential management using environment variables, never hardcoded
- Explicit Access Control: Code must explicitly call
expose_secret()to access credentials, enabling easy security audits
Protected Credentials:
- openEHR passwords
- Cosmos DB keys
- PostgreSQL connection strings (including embedded passwords)
- Azure client secrets
- TLS 1.2+: All connections encrypted in transit
- Certificate Verification: TLS certificate validation enabled by default
- Least Privilege: Read-only openEHR access recommended
- Azure RBAC: Integrate with Azure role-based access control
- Audit Logging: All operations logged with timestamps
- PHI/PII Protection: Sanitized logging, compliance-ready
- HIPAA-Ready: Designed for healthcare compliance requirements
- Data Verification: Optional SHA-256 checksums for data integrity
For detailed security best practices, see the Configuration Guide.
We welcome contributions! Here's how to get started:
- Fork the repository
- Create a feature branch:
git checkout -b feature/my-feature - Make your changes following the Developer Guide
- Run tests:
cargo test - Run linter:
cargo clippy --all-targets -- -D warnings - Format code:
cargo fmt - Commit changes:
git commit -m "feat: add new feature" - Push to branch:
git push origin feature/my-feature - Open a Pull Request
See CONTRIBUTING.md for detailed guidelines.
# Clone repository
git clone https://github.com/erikhoward/atlas.git
cd atlas
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Install development tools
rustup component add clippy rustfmt
# Build and test
cargo build
cargo test
cargo clippy --all-targets -- -D warnings
cargo fmtThis project is licensed under the MIT License - see the LICENSE file for details.
- User Guide - Usage instructions and troubleshooting
- PostgreSQL Setup Guide - PostgreSQL backend configuration
- Docker Setup Guide - Docker deployment instructions
- FAQ - Frequently asked questions
- GitHub Issues: Report bugs or request features
- Discussions: Ask questions and share ideas
For enterprise support, training, or custom development, contact: erikhoward@pm.me
Atlas is built with these excellent open-source projects:
- Rust - Systems programming language
- Tokio - Async runtime
- Clap - Command-line argument parsing
- Serde - Serialization framework
- Tracing - Structured logging
- Azure SDK for Rust - Azure integration
- tokio-postgres - PostgreSQL async driver
- deadpool-postgres - PostgreSQL connection pooling
- ✅ EHRBase vendor support
- ✅ Better Platform vendor support with OIDC authentication
- ✅ Azure Cosmos DB integration
- ✅ PostgreSQL integration
- ✅ Incremental sync with watermarks
- ✅ Preserve and flatten modes
- ✅ CLI interface
- ✅ Docker and Kubernetes deployment
- ✅ HIPAA & GDPR anonymization
- 🔄 Prometheus metrics export
- 🔄 FHIR transformation
- 🔄 Bi-directional synchronization
- 🔄 Support for other cloud providers (AWS, GCP)
- EHRBase - Open-source openEHR server
- Better Platform - Enterprise openEHR platform
- Azure Cosmos DB - Globally distributed database
- openEHR - Open standard for health data
Made with ❤️ by the Erik Howard & Atlas Contributors
If you find Atlas useful, please consider giving it a ⭐ on GitHub!
