Enso Atlas

Enso Atlas: Fully Local, FM-based Clinical Decision Support System for Pathology

Universal on-premise pathology AI platform: plug in any foundation model, dataset, or cancer task hassle free.
Built on Google Health AI Developer Foundations (HAI-DEF)

Test it out here: https://atlas.ensohealth.ai/

Competition Write-up • Documentation • Quick Start • Pipeline Scripts

Last Updated: February 24, 2026

Highlights

Multi-project platform: Project definitions are driven by config/projects.yaml; current reference projects are:
- ovarian-platinum — Ovarian Cancer - Platinum Sensitivity
- lung-stage — Lung Adenocarcinoma - Stage Classification
Strict project isolation: Slide listing, model selection, heatmaps, similar-case retrieval, report generation, batch analysis, and async tasks are scoped by project_id. Core routing stays project-bound, while level-0 compatibility checks may still reference legacy embedding layout within the selected project.
Project-scoped model visibility: 6 total classification models (5 ovarian + 1 lung), and each project only exposes assigned models.
Level-0 dense embeddings by default: Analysis and multi-model workflows default to full-resolution level-0 embeddings.
Explicit backend error behavior: Heatmap and multi-model endpoints return explicit errors for missing prerequisites:
- LEVEL0_EMBEDDINGS_REQUIRED
- COORDS_REQUIRED_FOR_HEATMAP
Heatmap rendering modes: Truthful patch-grid overlays plus optional interpolated/smoothed view.
Project-aware frontend UX: ModelPicker prunes stale model IDs on project switch; prediction panels, AI assistant, and patch zoom use project-specific language.
Local-first deployment: Runs on-premise; no PHI leaves the hospital network.

Pipeline Scripts

Recent model and pipeline updates.

scripts/train_transmil_finetune.py now supports patient-level stratified k-fold CV, class-balanced epoch sampling, minority-class feature augmentation (noise injection, feature dropout, mixup), configurable patch caps (max_train_patches, max_eval_patches), single-split mode, and per-fold PR-AUC plus calibration curves.
scripts/multi_model_inference.py now supports per-model decision thresholds (from config/projects.yaml or training outputs), wrapped checkpoint loading, CUDA OOM fallback with patch subsampling, and threshold-relative confidence calibration.
src/enso_atlas/mil/clam.py now includes a TransMILClassifier implementation alongside CLAM, with the same public inference interface.
src/enso_atlas/reporting/medgemma.py now includes structured report parsing, safety-constrained prompting, multi-section fallback behavior, and stronger generation error handling.
New data preparation scripts are included for barcode-balanced pool rebuilding, lung stage pool prep, ovarian endpoint pool prep, and bucket H5-to-NPY conversion:
- scripts/rebuild_multimodel_pools_barcode_balanced.py
- scripts/prepare_lung_stage_api_pool.py
- scripts/prepare_ov_endpoint_api_pool.py
- scripts/convert_bucket_h5_to_npy.py
config/projects.yaml now defines decision_threshold: 0.9935 for tumor_grade.

Screenshots

All screenshots below were refreshed from the live deployment at https://atlas.ensohealth.ai on Feb 24, 2026.

Dashboard (Oncologist View)

Oncologist summary view with prediction, similar-case counts, and top attention regions.

Dashboard (Pathologist View)

Pathologist workspace with WSI controls, annotation tools, and heatmap overlays.

Slide Manager

Slide inventory in grid view with filtering controls, embedding status, and patch counts.

Project Management

Project cards for ovarian and lung demos, including thresholds, embeddings, and navigation actions.

Quick Start

Docker Deployment (Recommended)

# Clone the repository
git clone https://github.com/Hilo-Hilo/enso-atlas.git
cd enso-atlas

# Build and start backend + database
docker compose -f docker/docker-compose.yaml build
docker compose -f docker/docker-compose.yaml up -d

# Backend API available at http://localhost:8003 (~3.5 min startup for MedGemma loading)

# Build and start frontend
cd frontend
npm install
npm run build
npx next start -p 3002

# Frontend available at http://localhost:3002

Portability note: docker/docker-compose.yaml currently includes developer-specific absolute bind mounts. Treat it as a template and use portable local defaults (relative paths or named volumes) and/or a local override file for your machine.

CPU/GPU note: Docker Compose can run on CPU-only hosts for basic development checks, but GPU acceleration is strongly recommended for practical embedding/inference/report latency.

Local Development

# Create virtual environment
python -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -e .

# Start the API server (port 8000 locally, 8003 via Docker)
python -m uvicorn enso_atlas.api.main:app --reload --host 0.0.0.0 --port 8000

# In a separate terminal
cd frontend
npm install
npm run dev
# Frontend runs at http://localhost:3000 (dev) or http://localhost:3002 (production)

Architecture

                        Enso Atlas Multi-Project Architecture

  config/projects.yaml
          |
          v
+----------------------+      +-------------------------+      +------------------+
|  Project Registry    |----->|   FastAPI Backend       |<-----|   Next.js 14     |
|  (project metadata,  |      |   project-scoped APIs   |      |   Frontend :3002 |
|  dataset + model map)|      |        :8003            |      +------------------+
+----------------------+      +-------------------------+
                                        |
                 +----------------------+-----------------------------+
                 |                      |                             |
                 v                      v                             v
          +-------------+        +--------------+              +-------------+
          | Path        |        | CLAM +       |              | MedGemma    |
          | Foundation  |        | TransMIL     |              | Reporting   |
          | (level-0)   |        | (project set)|              | (async)     |
          +-------------+        +--------------+              +-------------+
                 |                      |                             |
                 +----------------------+-----------------------------+
                                        |
                                 +-------------+
                                 | PostgreSQL  |
                                 | project_*   |
                                 | junctions   |
                                 +-------------+

Core Components

Component	Description
Project Registry	Loads `config/projects.yaml`, including project IDs, dataset paths, and per-project model assignments
Project-Scoped Routing	Endpoints enforce `project_id` scope for slides, models, analysis, retrieval, and reports
WSI Processing	OpenSlide-based processing with tissue detection
Path Foundation	384-dim patch embeddings; level-0 dense embeddings are the default analysis path
CLAM + TransMIL	Attention-based MIL options for slide-level classification (shared classifier interface)
MedSigLIP	Text-to-patch semantic search (project-scoped availability)
FAISS Retrieval	Similar case search constrained to slides in the selected project
MedGemma 1.5 4B	Structured clinical report generation with project-aware context
PostgreSQL	Slide metadata, result caching, and project-model / project-slide assignments

Classification Metrics (Model Registry Metadata)

The AUC values below are demo metadata attached to configured model entries. Treat them as reference values unless you also have matching evaluation artifacts and rerun the corresponding evaluation pipeline in your environment.

Model	Project Scope	Task	AUC
platinum_sensitivity	ovarian-platinum	Platinum treatment response	0.907
tumor_grade	ovarian-platinum	Tumor grade classification	0.752
survival_5y	ovarian-platinum	5-year survival prediction	0.697
survival_3y	ovarian-platinum	3-year survival prediction	0.645
survival_1y	ovarian-platinum	1-year survival prediction	0.639
lung_stage	lung-stage	Lung adenocarcinoma stage classification (early vs advanced)	0.648

Total: 6 project-scoped classification models (5 ovarian + 1 lung).

Tech Stack

Layer	Technology
WSI I/O	OpenSlide
Embeddings	Path Foundation (ViT-S, 384-dim)
Semantic Search	MedSigLIP (text-to-patch retrieval)
Classification	CLAM + TransMIL (attention-based MIL)
Retrieval	FAISS
Reporting	MedGemma 1.5 4B
Backend	FastAPI + Python 3.10+ + asyncpg
Frontend	Next.js 14.2 + TypeScript + Tailwind CSS
Viewer	OpenSeadragon
Database	PostgreSQL
Deployment	Docker Compose on NVIDIA DGX Spark (ARM64)

API Reference

All endpoints are served at http://localhost:8003 (Docker) or http://localhost:8000 (local).

Core Endpoints

# Health check
curl http://localhost:8003/api/health

# List slides in ovarian project
curl "http://localhost:8003/api/slides?project_id=ovarian-platinum"

# List project-scoped models
curl "http://localhost:8003/api/models?project_id=lung-stage"

# Multi-model analysis (renamed endpoint)
curl -X POST http://localhost:8003/api/analyze-multi \
  -H "Content-Type: application/json" \
  -d '{"slide_id": "TCGA-XX-XXXX", "project_id": "lung-stage"}'

# Batch analysis (project-scoped)
curl -X POST http://localhost:8003/api/analyze-batch \
  -H "Content-Type: application/json" \
  -d '{"slide_ids": ["slide_1", "slide_2"], "project_id": "ovarian-platinum"}'

# Generate clinical report (project-scoped)
curl -X POST http://localhost:8003/api/report \
  -H "Content-Type: application/json" \
  -d '{"slide_id": "TCGA-XX-XXXX", "project_id": "ovarian-platinum"}'

# Similar-case retrieval (project-scoped)
curl "http://localhost:8003/api/similar?slide_id=TCGA-XX-XXXX&project_id=lung-stage"

Key Endpoint List (representative; see Swagger/ReDoc for full schema)

Method	Endpoint	Description
GET	/api/health	Health check
GET	/api/slides?project_id={project_id}	List slides scoped to a project
GET	/api/models?project_id={project_id}	List models assigned to a project
POST	/api/analyze	Single-slide analysis (`project_id` in request body)
POST	/api/analyze-multi	Multi-model analysis (`project_id` in request body)
POST	/api/analyze-batch	Synchronous batch analysis (`project_id` in request body)
POST	/api/analyze-batch/async	Async batch analysis task (`project_id` in request body)
GET	/api/analyze-batch/status/{task_id}	Check async batch task status
POST	/api/report	Generate report (`project_id` in request body)
POST	/api/report/async	Async report generation (`project_id` in request body)
GET	/api/report/status/{task_id}	Check async report task status
GET	/api/similar?slide_id={id}&project_id={project_id}	Similar-case retrieval within project scope
POST	/api/semantic-search	MedSigLIP semantic search (`project_id` in request body)
GET	/api/heatmap/{slide_id}?project_id={project_id}&smooth={bool}	Slide heatmap with optional interpolation
GET	/api/heatmap/{slide_id}/{model_id}?project_id={project_id}&smooth={bool}	Model-specific attention heatmap
GET/POST	/api/projects	List/create projects
GET/PUT/DELETE	/api/projects/{project_id}	Read/update/delete one project
GET/POST/DELETE	/api/projects/{project_id}/slides	Assign/unassign slides per project
GET/POST/DELETE	/api/projects/{project_id}/models	Assign/unassign models per project

Error Behavior for Missing Prerequisites

Heatmap and multi-model analysis paths return explicit errors instead of silent fallback behavior:

LEVEL0_EMBEDDINGS_REQUIRED when level-0 embeddings are unavailable
COORDS_REQUIRED_FOR_HEATMAP when *_coords.npy is missing

Interactive Documentation

Swagger UI: http://localhost:8003/api/docs
ReDoc: http://localhost:8003/api/redoc

Expected Project Structure

The tree below shows the expected layout for a configured deployment. In a clean clone, large datasets/models are often mounted or generated externally.

med-gemma-hackathon/
|-- src/enso_atlas/
|   |-- api/           # FastAPI endpoints
|   |-- embedding/     # Path Foundation embedder
|   |-- evidence/      # Heatmaps and FAISS retrieval
|   |-- mil/           # CLAM + TransMIL attention classifiers
|   |-- reporting/     # MedGemma report generation
|   |-- wsi/           # WSI processing
|-- frontend/          # Next.js 14.2 application
|-- docker/            # Docker Compose configuration
|-- config/            # projects.yaml and configuration
|-- data/
|   |-- projects/
|   |   |-- ovarian-platinum/   # example project directory
|   |   |   |-- slides/
|   |   |   |-- embeddings/
|   |   |   \-- labels.csv
|   |   \-- lung-stage/         # example project directory
|   |       |-- slides/
|   |       |-- embeddings/
|   |       \-- labels.json
|-- models/            # Trained TransMIL weights
|-- tests/             # Unit tests
|-- docs/              # Documentation and screenshots

Data Layout

Per-project datasets are expected to follow a modular structure (either in-repo or external paths configured in config/projects.yaml):

data/projects/{project-id}/slides/
data/projects/{project-id}/embeddings/
data/projects/{project-id}/labels.csv or labels.json

This replaces earlier flat dataset assumptions and enables independent project lifecycle management.

Level-0 reliability guardrail: keep data/projects/{project-id}/embeddings/level0/ synchronized with top-level embeddings/*.npy files (including *_coords.npy). After embedding updates or migrations, run:

python scripts/validate_project_modularity.py --check-embedding-layout

If this check fails, level-0 heatmaps and analysis can report missing level-0 embeddings even when flat embeddings exist.

Configuration

Environment Variables

Variable	Description	Default
`CUDA_VISIBLE_DEVICES`	GPU selection	All GPUs
`NEXT_PUBLIC_API_URL`	Frontend API URL	(empty = same-origin `/api`)

Public deployment note: for Cloudflare/Tailscale public hosting, keep NEXT_PUBLIC_API_URL empty so browsers call the same origin (/api/...). Hardcoding a private/Tailnet IP can cause "backend disconnected" for public users.

Project Configuration

Projects are managed via config/projects.yaml and /api/projects CRUD endpoints.

Current reference projects:

ovarian-platinum: Ovarian Cancer - Platinum Sensitivity
lung-stage: Lung Adenocarcinoma - Stage Classification

Project isolation is enforced in API routing and task execution, including batch analysis and async report generation.

Model-level decision thresholds can also be set in config/projects.yaml (for example, tumor_grade currently uses decision_threshold: 0.9935).

Dataset

config/projects.yaml includes two reference project configurations:

Ovarian cancer cohort configuration for platinum sensitivity, tumor grade, and survival classification
Lung adenocarcinoma cohort configuration for stage classification

In many deployments, raw WSIs/embeddings are stored outside this repository and mounted into the configured dataset paths. When available, Path Foundation level-0 dense embeddings are the default analysis path.

Docker Deployment

Services are defined in docker/docker-compose.yaml:

Service	Description	Port
enso-atlas	FastAPI backend + ML models	8003 (host) -> 8000 (container)
atlas-db	PostgreSQL database	5433

The backend takes approximately 3.5 minutes to fully start due to MedGemma model loading. The frontend runs separately outside Docker.

Before launching in a new environment, update host bind mounts in Compose (or layer an override file) so paths are valid on your machine.

See docs.md (Hospital Deployment Guide section) for detailed deployment instructions.

Development

Running Tests

pytest tests/
pytest --cov=src tests/

Code Quality

ruff check src/
black src/ --check
mypy src/

cd frontend && npm run lint

Acknowledgments

Google Health AI for Path Foundation, MedGemma, and MedSigLIP
NVIDIA for DGX Spark compute resources
TCGA for ovarian and lung whole-slide image datasets
TransMIL for the Transformer-based MIL architecture

License

This repository uses a Kaggle-compatible licensing layout:

Repository content (default): Creative Commons Attribution 4.0 (CC BY 4.0) — see LICENSE
Software source code (additional permission): MIT License — see LICENSE-CODE-MIT

For scope and third-party terms, see NOTICE.

References

Shao et al., "TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classification," NeurIPS, 2021.
Google Health AI, Path Foundation
Google, MedGemma
Google Health AI, MedSigLIP

Name		Name	Last commit message	Last commit date
Latest commit History 609 Commits
.github		.github
config		config
data		data
docker		docker
docs		docs
frontend		frontend
models		models
outputs		outputs
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
LICENSE-CODE-MIT		LICENSE-CODE-MIT
Makefile		Makefile
NOTICE		NOTICE
README.md		README.md
docs.md		docs.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Enso Atlas

Enso Atlas: Fully Local, FM-based Clinical Decision Support System for Pathology

Highlights

Pipeline Scripts

Screenshots

Dashboard (Oncologist View)

Dashboard (Pathologist View)

Slide Manager

Project Management

Quick Start

Docker Deployment (Recommended)

Local Development

Architecture

Core Components

Classification Metrics (Model Registry Metadata)

Tech Stack

API Reference

Core Endpoints

Key Endpoint List (representative; see Swagger/ReDoc for full schema)

Error Behavior for Missing Prerequisites

Interactive Documentation

Expected Project Structure

Data Layout

Configuration

Environment Variables

Project Configuration

Dataset

Docker Deployment

Development

Running Tests

Code Quality

Acknowledgments

License

References

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages