Skip to content

erikhoward/atlas

Atlas Logo

Atlas

Build Status License: MIT Rust Version Documentation

Atlas is a high-performance, open-source ETL tool built in Rust that bridges openEHR clinical data repositories with modern analytics platforms. It enables healthcare organizations to seamlessly export openEHR compositions to Azure Cosmos DB or PostgreSQL for advanced analytics, machine learning, and research.

🎯 Overview

Atlas solves the challenge of making openEHR clinical data accessible for modern analytics workflows. By exporting compositions from openEHR servers (EHRBase, Better Platform) to your choice of database backend (Azure Cosmos DB or PostgreSQL), Atlas enables:

  • Clinical Research: Query patient data using familiar SQL instead of AQL
  • Machine Learning: Build ML models on flattened, analytics-ready data
  • Operational Analytics: Power dashboards and reports with Azure-native tools
  • Regulatory Reporting: Maintain audit trails with data verification
  • Data Integration: Connect openEHR data to Azure Synapse, Databricks, and Power BI

✨ Key Features

Core Capabilities

  • 🚀 High Performance: Built with Rust for async/concurrent processing

    • Batch processing with configurable sizes (100-5000 compositions)
    • Parallel EHR processing (1-100 concurrent EHRs)
    • Throughput: 1000-2000 compositions/minute
  • 🔄 Incremental Sync: Smart state management with watermarks

    • Track last export per {template_id, ehr_id} combination
    • Export only new/changed data since last run
    • Automatic checkpoint and resume from failures
  • 🎨 Flexible Transformation: Multiple composition formats

    • Preserve Mode: Maintain exact FLAT JSON structure from openEHR
    • Flatten Mode: Convert nested paths to flat field names for ML/analytics
  • ⚙️ Easy Configuration: TOML-based with environment variable support

    • Simple, human-readable configuration files
    • Secure credential management with env vars
    • Comprehensive validation and error messages
  • 🛡️ Reliable & Resilient: Production-ready error handling

    • Automatic retry with exponential backoff
    • Partial batch failure handling
    • Duplicate detection and skipping
    • Graceful shutdown with SIGTERM/SIGINT handling
    • Automatic checkpoint on interruption for safe resume
  • 📊 Database Flexibility: Multiple backend options

    • Azure Cosmos DB: Core (SQL) API with automatic partitioning
    • PostgreSQL: 14+ with JSONB support for flexible querying
    • Azure Log Analytics integration (Logs Ingestion API)
    • Kubernetes/AKS deployment support
  • 🔒 Privacy & Compliance: Built-in anonymization

    • Automated PII Detection: Regex-based detection of 24+ PII categories
    • HIPAA Safe Harbor: 18 identifiers per 45 CFR §164.514(b)(2)
    • GDPR Compliance: HIPAA identifiers + GDPR quasi-identifiers
    • Flexible Strategies: Redaction or tokenization
    • Dry-Run Mode: Preview PII detection without modifying data
    • Audit Logging: SHA-256 hashed values, comprehensive tracking
    • Zero Performance Impact: <100ms overhead, <15% throughput impact

Technical Highlights

  • Vendor Abstraction: Trait-based design supports multiple openEHR vendors (EHRBase, Better Platform)
  • Type Safety: Strongly-typed domain models with Rust's type system
  • Observability: Structured logging with tracing, Azure integration
  • Security: TLS 1.2+, credential management, least-privilege access
  • Compliance: HIPAA-ready, GDPR-ready, audit logging, data verification

🚀 Quick Start

Prerequisites

  • Rust 1.70+ (for building from source)
  • openEHR Server (choose one):
    • EHRBase: Version 0.30+ with REST API v1.1.x
    • Better Platform: Sandbox or production environment with OIDC authentication
  • Database Backend (choose one):
    • Azure Cosmos DB: Core (SQL) API account with database created
    • PostgreSQL: Version 14+ with database created
  • Network Access: Outbound HTTPS to openEHR server and database

Installation

Option 1: Pre-built Binary (Recommended)

# Download latest release
wget https://github.com/erikhoward/atlas/releases/download/v2.4.0/atlas-linux-x86_64.tar.gz

# Extract and install
tar -xzf atlas-linux-x86_64.tar.gz
sudo mv atlas /usr/local/bin/
sudo chmod +x /usr/local/bin/atlas

# Verify installation
atlas --version

Option 2: Build from Source

# Install Rust (if not already installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Clone the repository
git clone https://github.com/erikhoward/atlas.git
cd atlas

# Build release binary
cargo build --release

# Install binary
sudo cp target/release/atlas /usr/local/bin/

# Verify installation
atlas --version

Option 3: Docker (Recommended for Production)

# Pull the latest Docker image
docker pull erikhoward/atlas:latest

# Run Atlas with configuration file
docker run --rm \
  -v $(pwd)/atlas.toml:/app/config/atlas.toml \
  -e ATLAS_OPENEHR_USERNAME=your_username \
  -e ATLAS_OPENEHR_PASSWORD=your_password \
  -e ATLAS_COSMOSDB_KEY=your_cosmos_key \
  erikhoward/atlas:latest \
  export --config /app/config/atlas.toml

# Or use docker-compose (see docker-compose.yml example)
docker-compose up

Docker Benefits:

  • ✅ No Rust installation required
  • ✅ Consistent environment across deployments
  • ✅ Easy integration with Kubernetes/AKS
  • ✅ Multi-platform support (amd64, arm64)

See Docker Setup Guide for detailed instructions.

Configuration

# Generate sample configuration with examples
atlas init --with-examples --output atlas.toml

# Edit configuration for your environment
vi atlas.toml

# Option 1: Use .env file (recommended for development)
# Create a .env file in the project root with your credentials
cat > .env << EOF
ATLAS_OPENEHR_USERNAME=your-openehr-username
ATLAS_OPENEHR_PASSWORD=your-openehr-password
ATLAS_PG_PASSWORD=your-postgres-password
EOF

# The .env file is automatically loaded when Atlas starts

# Option 2: Set environment variables manually
export ATLAS_OPENEHR_USERNAME="your-openehr-username"
export ATLAS_OPENEHR_PASSWORD="your-openehr-password"

# For CosmosDB
export ATLAS_COSMOSDB_KEY="your-cosmos-db-key"

# For PostgreSQL
export ATLAS_PG_PASSWORD="your-postgres-password"

# Validate configuration
atlas validate-config -c atlas.toml

Minimal Configuration Example (CosmosDB):

[openehr]
base_url = "https://your-ehrbase-server.com/ehrbase"
username = "${ATLAS_OPENEHR_USERNAME}"
password = "${ATLAS_OPENEHR_PASSWORD}"
tls_verify = true

[openehr.query]
template_ids = ["IDCR - Vital Signs.v1"]

[export]
mode = "incremental"
export_composition_format = "preserve"
database_target = "cosmosdb"

[cosmosdb]
endpoint = "https://your-account.documents.azure.com:443/"
key = "${ATLAS_COSMOSDB_KEY}"
database_name = "openehr_data"

Minimal Configuration Example (PostgreSQL):

[openehr]
base_url = "https://your-ehrbase-server.com/ehrbase"
username = "${ATLAS_OPENEHR_USERNAME}"
password = "${ATLAS_OPENEHR_PASSWORD}"
tls_verify = true

[openehr.query]
template_ids = ["IDCR - Vital Signs.v1"]

[export]
mode = "incremental"
export_composition_format = "preserve"
database_target = "postgresql"

[postgresql]
connection_string = "postgresql://atlas_user:${ATLAS_PG_PASSWORD}@localhost:5432/openehr_data?sslmode=require"
max_connections = 20

See examples/atlas.example.toml for CosmosDB configuration and examples/atlas.postgresql.example.toml for PostgreSQL configuration.

12-Factor App Configuration

Atlas supports comprehensive environment variable overrides for all configuration options, enabling containerized deployments and 12-factor app compliance:

# Override any configuration value using ATLAS_<SECTION>_<KEY> pattern
export ATLAS_DATABASE_TARGET=postgresql
export ATLAS_APPLICATION_LOG_LEVEL=debug
export ATLAS_OPENEHR_BASE_URL=https://prod-ehrbase.com
export ATLAS_OPENEHR_USERNAME=atlas_prod
export ATLAS_OPENEHR_PASSWORD=secret
export ATLAS_OPENEHR_QUERY_BATCH_SIZE=2000
export ATLAS_EXPORT_MODE=incremental
export ATLAS_POSTGRESQL_CONNECTION_STRING="postgresql://user:pass@postgres:5432/db"
export ATLAS_POSTGRESQL_MAX_CONNECTIONS=20

# Arrays support JSON or comma-separated format
export ATLAS_OPENEHR_QUERY_TEMPLATE_IDS='["IDCR - Vital Signs.v1","IDCR - Lab Report.v1"]'
export ATLAS_OPENEHR_QUERY_EHR_IDS="ehr-123,ehr-456,ehr-789"

# Run with minimal TOML file (or even no TOML file with all env vars set)
atlas export -c minimal.toml

Docker Example:

docker run -d \
  -e ATLAS_DATABASE_TARGET=postgresql \
  -e ATLAS_OPENEHR_BASE_URL=https://ehrbase.example.com \
  -e ATLAS_OPENEHR_USERNAME=atlas \
  -e ATLAS_OPENEHR_PASSWORD="${OPENEHR_PASSWORD}" \
  -e ATLAS_OPENEHR_QUERY_TEMPLATE_IDS='["IDCR - Vital Signs.v1"]' \
  -e ATLAS_POSTGRESQL_CONNECTION_STRING="${PG_CONNECTION_STRING}" \
  -e ATLAS_EXPORT_MODE=incremental \
  atlas:latest

Kubernetes Example:

apiVersion: v1
kind: ConfigMap
metadata:
  name: atlas-config
data:
  ATLAS_DATABASE_TARGET: "postgresql"
  ATLAS_OPENEHR_BASE_URL: "https://ehrbase.example.com"
  ATLAS_OPENEHR_QUERY_TEMPLATE_IDS: '["IDCR - Vital Signs.v1"]'
  ATLAS_EXPORT_MODE: "incremental"
---
apiVersion: v1
kind: Secret
metadata:
  name: atlas-secrets
type: Opaque
stringData:
  ATLAS_OPENEHR_PASSWORD: "secret"
  ATLAS_POSTGRESQL_CONNECTION_STRING: "postgresql://user:pass@postgres:5432/db"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: atlas
spec:
  template:
    spec:
      containers:
      - name: atlas
        image: atlas:latest
        envFrom:
        - configMapRef:
            name: atlas-config
        - secretRef:
            name: atlas-secrets

See Configuration Guide for complete list of supported environment variables.

Basic Usage

# Run export
atlas export -c atlas.toml

# Dry run to preview (no data written)
atlas export -c atlas.toml --dry-run

# Check export status and watermarks
atlas status -c atlas.toml

# Override configuration options
atlas export -c atlas.toml --mode full --template-id "Your Template.v1"

Graceful Shutdown

Atlas supports graceful shutdown for long-running exports, ensuring data integrity and allowing safe resumption:

# Start an export
atlas export -c atlas.toml

# Press Ctrl+C or send SIGTERM to gracefully stop
# Atlas will:
# 1. Complete the current batch being processed
# 2. Save watermark state to database
# 3. Display progress summary
# 4. Exit with code 130 (SIGINT) or 143 (SIGTERM)

# Resume from where it left off
atlas export -c atlas.toml

Key Features:

  • Safe Interruption: Current batch completes before shutdown (no partial data)
  • Automatic Checkpoint: Watermarks saved with Interrupted status
  • Resume Support: Re-run the same command to continue from checkpoint
  • Configurable Timeout: Default 30s grace period (configurable via export.shutdown_timeout_secs)
  • Container-Ready: Works with Docker stop, Kubernetes pod termination, systemd

Configuration:

[export]
# Graceful shutdown timeout in seconds (default: 30)
# Should align with container orchestration grace periods
shutdown_timeout_secs = 30

Exit Codes:

  • 0 - Export completed successfully
  • 1 - Partial success (some exports failed)
  • 130 - Interrupted by SIGINT (Ctrl+C)
  • 143 - Interrupted by SIGTERM (graceful termination signal)
  • Other codes indicate configuration, authentication, or connection errors

Example Use Cases

See the examples/ directory for complete configurations:

🔒 Anonymization

Atlas includes built-in anonymization capabilities to protect PHI/PII when exporting openEHR compositions, helping organizations comply with HIPAA and GDPR regulations.

Quick Start

Add anonymization configuration to your atlas.toml:

[anonymization]
enabled = true
mode = "hipaa_safe_harbor"  # or "gdpr"
strategy = "token"          # or "redact"
dry_run = false

[anonymization.audit]
enabled = true
log_path = "./audit/anonymization.log"
json_format = true

Run export with anonymization:

# Enable anonymization
atlas export --anonymize

# Override compliance mode
atlas export --anonymize --anonymize-mode gdpr

# Dry-run to preview PII detection
atlas export --anonymize --anonymize-dry-run

Features

  • Automated PII Detection: Regex-based detection of 24+ PII categories
  • HIPAA Safe Harbor: 18 identifiers per 45 CFR §164.514(b)(2)
  • GDPR Compliance: HIPAA identifiers + 6 GDPR quasi-identifiers
  • Flexible Strategies:
    • Token: Replace with unique random tokens (e.g., TOKEN_NAME_a1b2c3d4)
    • Redact: Replace with category markers (e.g., [REDACTED_NAME])
  • Dry-Run Mode: Preview PII detection without modifying data
  • Audit Logging: SHA-256 hashed values, comprehensive tracking
  • Performance: <100ms overhead per composition, <15% throughput impact

Compliance Modes

HIPAA Safe Harbor (hipaa_safe_harbor):

  • Detects 18 identifiers specified in 45 CFR §164.514(b)(2)
  • Suitable for US healthcare organizations

GDPR (gdpr):

  • Detects all HIPAA identifiers + 6 GDPR quasi-identifiers
  • Suitable for European organizations or multi-region deployments

Documentation

For complete anonymization documentation, see:

🐳 Docker Deployment

Atlas provides official Docker images for easy deployment and integration with container orchestration platforms.

Quick Start with Docker

# Pull the latest image
docker pull erikhoward/atlas:latest

# Run with configuration file and environment variables
docker run --rm \
  -v $(pwd)/atlas.toml:/app/config/atlas.toml \
  -v $(pwd)/logs:/app/logs \
  -e ATLAS_OPENEHR_USERNAME=${OPENEHR_USER} \
  -e ATLAS_OPENEHR_PASSWORD=${OPENEHR_PASS} \
  -e ATLAS_COSMOSDB_KEY=${COSMOS_KEY} \
  erikhoward/atlas:latest \
  export --config /app/config/atlas.toml

Using Docker Compose

Create a docker-compose.yml file:

version: '3.8'

services:
  atlas:
    image: erikhoward/atlas:latest
    volumes:
      - ./atlas.toml:/app/config/atlas.toml
      - ./logs:/app/logs
    environment:
      - ATLAS_OPENEHR_USERNAME=${OPENEHR_USER}
      - ATLAS_OPENEHR_PASSWORD=${OPENEHR_PASS}
      - ATLAS_COSMOSDB_KEY=${COSMOS_KEY}
      - RUST_LOG=info
    command: export --config /app/config/atlas.toml

Run with:

docker-compose up

Available Tags

  • latest - Latest stable release from main branch
  • 2.4.0, 2.3, 2 - Semantic version tags
  • main-<sha> - Specific commit from main branch

Multi-Platform Support

Images are built for multiple architectures:

  • linux/amd64 - Standard x86_64 servers
  • linux/arm64 - ARM64 (Apple Silicon, AWS Graviton, etc.)

Building Custom Images

# Build locally
docker build -t atlas:custom .

# Build for specific platform
docker build --platform linux/amd64 -t atlas:custom .

For detailed Docker setup, configuration, and troubleshooting, see the Docker Setup Guide.

📖 Documentation

User Documentation

Technical Documentation

Deployment Guides

🏗️ Architecture

Atlas follows a layered architecture with clear separation of concerns:

┌─────────────────────────────────────────────────────────────────────────┐
│                            Atlas CLI                                    │
│                         (Rust Binary)                                   │
└──────────────┬──────────────────────────────────────┬───────────────────┘
               │                                      │
               │ REST API v1.1                        │ Database Adapters
               │                                      │
               ▼                                      ▼
┌──────────────────────────┐   ┌──────────────────────────────────────────┐
│   openEHR Server         │   │         Database Backends                │
│   (EHRBase 0.30+)        │   │                                          │
│                          │   │  ┌────────────────────────────────────┐  │
│  ┌────────────────────┐  │   │  │  Azure Cosmos DB (NoSQL)           │  │
│  │  Compositions      │  │   │  │  - Control Container (watermarks)  │  │
│  │  (FLAT JSON)       │  │   │  │  - Data Containers (per template)  │  │
│  └────────────────────┘  │   │  │  - Partitioned by /ehr_id          │  │
│                          │   │  └────────────────────────────────────┘  │
└──────────────────────────┘   │                                          │
                               │  ┌────────────────────────────────────┐  │
                               │  │  PostgreSQL 14+ (Relational)       │  │
                               │  │  - atlas_watermarks table          │  │
                               │  │  - compositions_* tables           │  │
                               │  │  - JSONB columns for flexibility   │  │
                               │  └────────────────────────────────────┘  │
                               └──────────────────────────────────────────┘

Key Components:

  • CLI Layer: Command-line interface with clap
  • Core Layer: Business logic (export, transform, state, verification)
  • Adapter Layer: External integrations (openEHR, Cosmos DB, PostgreSQL)
  • Domain Layer: Core types and models

See Architecture Documentation for details.

🎯 Use Cases

Clinical Research

Export patient cohorts for research studies while preserving exact data structures for regulatory compliance.

Machine Learning

Flatten compositions into analytics-ready format for training predictive models on clinical data.

Operational Analytics

Power real-time dashboards and reports by syncing openEHR data to Cosmos DB daily.

Data Integration

Connect openEHR data to Azure Synapse Analytics, Databricks, or Power BI for advanced analytics.

Regulatory Reporting

Maintain comprehensive audit trails and logging for compliance requirements.

🔧 Configuration Options

Atlas supports extensive configuration options:

Category Options Description
Export Mode full, incremental Full export or incremental sync
Format preserve, flatten Maintain structure or flatten for analytics
Batch Size 100-5000 Compositions per batch
Parallelism 1-100 EHRs Concurrent EHR processing
Logging Local, Azure Log Analytics Structured logging options

See Configuration Guide for complete reference.

📊 Performance

Typical Performance (depends on composition size and network):

  • Throughput: 1000-2000 compositions/minute
  • Memory: 2-4 GB RAM (configurable with batch size)
  • Cosmos DB: ~10 RU per composition write

Example Scenarios:

  • Daily Sync: 1,000 compositions in ~1-2 minutes
  • Research Export: 50,000 compositions in ~50-100 minutes
  • ML Dataset: 500,000 compositions in ~4-8 hours

🔒 Security

Atlas implements comprehensive security measures to protect sensitive healthcare data and credentials:

Credential Protection

  • Memory Security: All credentials (passwords, keys, secrets) are automatically zeroized in memory when no longer needed
  • No Credential Logging: Credentials are never written to log files or exposed in debug output
  • Redacted Debug Output: Debug representations show Secret([REDACTED]) instead of actual values
  • Environment Variables: Secure credential management using environment variables, never hardcoded
  • Explicit Access Control: Code must explicitly call expose_secret() to access credentials, enabling easy security audits

Protected Credentials:

  • openEHR passwords
  • Cosmos DB keys
  • PostgreSQL connection strings (including embedded passwords)
  • Azure client secrets

Network & Access Security

  • TLS 1.2+: All connections encrypted in transit
  • Certificate Verification: TLS certificate validation enabled by default
  • Least Privilege: Read-only openEHR access recommended
  • Azure RBAC: Integrate with Azure role-based access control

Compliance & Audit

  • Audit Logging: All operations logged with timestamps
  • PHI/PII Protection: Sanitized logging, compliance-ready
  • HIPAA-Ready: Designed for healthcare compliance requirements
  • Data Verification: Optional SHA-256 checksums for data integrity

For detailed security best practices, see the Configuration Guide.

🤝 Contributing

We welcome contributions! Here's how to get started:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/my-feature
  3. Make your changes following the Developer Guide
  4. Run tests: cargo test
  5. Run linter: cargo clippy --all-targets -- -D warnings
  6. Format code: cargo fmt
  7. Commit changes: git commit -m "feat: add new feature"
  8. Push to branch: git push origin feature/my-feature
  9. Open a Pull Request

See CONTRIBUTING.md for detailed guidelines.

Development Setup

# Clone repository
git clone https://github.com/erikhoward/atlas.git
cd atlas

# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Install development tools
rustup component add clippy rustfmt

# Build and test
cargo build
cargo test
cargo clippy --all-targets -- -D warnings
cargo fmt

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support

Documentation

Community

Commercial Support

For enterprise support, training, or custom development, contact: erikhoward@pm.me

🙏 Acknowledgments

Atlas is built with these excellent open-source projects:

🗺️ Roadmap

Current Version (v2.3)

  • ✅ EHRBase vendor support
  • ✅ Better Platform vendor support with OIDC authentication
  • ✅ Azure Cosmos DB integration
  • ✅ PostgreSQL integration
  • ✅ Incremental sync with watermarks
  • ✅ Preserve and flatten modes
  • ✅ CLI interface
  • ✅ Docker and Kubernetes deployment
  • ✅ HIPAA & GDPR anonymization

Future Enhancements

  • 🔄 Prometheus metrics export
  • 🔄 FHIR transformation
  • 🔄 Bi-directional synchronization
  • 🔄 Support for other cloud providers (AWS, GCP)

📚 Related Projects


Made with ❤️ by the Erik Howard & Atlas Contributors

If you find Atlas useful, please consider giving it a ⭐ on GitHub!

About

ETL tool for exporting OpenEHR compositions to multiple datastore backends

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Languages