A dashboard for tracking synchronization status of OpenNeuro datasets across GraphQL API, GitHub mirrors, and S3 exports.
The monitoring system uses a multi-stage pipeline that generates static JSON files consumed by a client-side dashboard:
fetch-graphql → check-github → check-s3-version → check-s3-files → summarize
Each stage reads from previous stages and writes new check files, allowing incremental updates and independent execution.
The pipeline is implemented as an installable Python package under code/, exposing an openneuro-dashboard CLI.
All data files (aspirationally) follow a versioned schema defined in schema/openneuro-dashboard.yaml (LinkML format).
This is not yet validated.
-
data/datasets-registry.json: Master registry from GraphQL- Maps dataset IDs to latest snapshot tags
- Source of truth for what datasets exist
-
data/all-datasets.json: Pre-computed summary for dashboard- Aggregates all check results
- Includes per-dataset status and timestamps
data/datasets/{id}/
├── snapshots.json # List of all snapshot tags
├── github.json # GitHub mirror status (branches, tags, HEAD)
├── s3-version.json # Version from S3 dataset_description.json
├── s3-diff.json # File differences (only if S3 accessible)
└── snapshots/{tag}/
├── metadata.json # Snapshot metadata (SHA, creation date)
└── files.json # Complete file list from git tree
- Uses
git ls-remote --symrefto fetch all refs - Validates:
- All snapshot tags exist on GitHub
- HEAD points to latest snapshot
- Commit SHAs match GraphQL data
-
Fetches
dataset_description.jsonfrom S3 -
Extracts version from
DatasetDOIfield -
Edge cases:
- Normal: DOI with correct dataset ID and version
- Assumed latest: Missing/custom DOI → use latest snapshot for comparison
- Blocked (403): Access denied → no file comparison possible
- Not found (404): Missing file → use latest snapshot for comparison
-
Only 403 errors block further validation
-
All other cases allow file comparison with assumed version
- Compares S3 file listing against git tree
- Uses version from
s3-version.json(either from DOI or assumed latest) - Skipped if S3 is blocked (403)
- Special case:
exportMissing: trueif S3 has zero files
Per-check statuses:
ok: Check passedwarning: Minor issues (e.g., assumed version, HEAD mismatch)error: Check failed or blockedversion-mismatch: S3 DOI version ≠ latest snapshotpending: Check not yet run
Special flags:
s3Blocked: truein summary indicates 403 error (shows lock icon)
uv syncRequires Python 3.14+.
uv run openneuro-dashboard run-all# Stage 1: Fetch GraphQL data
uv run openneuro-dashboard fetch-graphql
# Stage 2: Check GitHub mirrors
uv run openneuro-dashboard check-github
# Stage 3: Check S3 versions
uv run openneuro-dashboard check-s3-version
# Stage 4: Check S3 files
uv run openneuro-dashboard check-s3-files --cache-dir ~/.cache/openneuro-dashboard/repos
# Stage 5: Summarize
uv run openneuro-dashboard summarizeCommon options:
--verbose/-v: Enable verbose output--max-datasets N: Limit number of datasets (forfetch-graphqlandrun-all)
uv run openneuro-dashboard gen-data --num-datasets 50 --seed 42uv run --group test pytest -vStatic HTML/CSS/JS dashboard served from the repository root.
index.html: Main dashboard (dataset list)dataset.html: Detail view for individual datasetsjs/main.js: Dashboard logicjs/dataset.js: Detail view logicjs/utils.js: Shared utilities
python -m http.server 8000Navigate to http://localhost:8000
Main view:
- Sortable/filterable dataset table
- Summary statistics by status
- Search by dataset ID
- Color-coded status badges
- Lock icons for blocked S3 datasets
Detail view:
- Snapshot history
- Detailed check results with expandable sections
- File diff viewer (when mismatches exist)
- Lazy-loaded file listings
Immutable (never changes once created):
snapshots/{tag}/metadata.jsonsnapshots/{tag}/files.json
Mutable (updated on each check run):
datasets-registry.jsongithub.jsons3-version.jsons3-diff.jsonall-datasets.json
This allows caching of snapshot data while keeping check results fresh.
The LinkML schema (schema/openneuro-dashboard.yaml) includes a schemaVersion field in all data files. When making breaking changes:
- Increment schema version
- Update all pipeline scripts to write new version
- Add migration logic if needed
- Update dashboard to handle both versions during transition