Skip to content

ICD-10 microservice, Docker stack, and CI improvements#25

Open
MelbourneDeveloper wants to merge 76 commits intomainfrom
claude/icd10-microservice-setup-ZSSSH
Open

ICD-10 microservice, Docker stack, and CI improvements#25
MelbourneDeveloper wants to merge 76 commits intomainfrom
claude/icd10-microservice-setup-ZSSSH

Conversation

@MelbourneDeveloper
Copy link
Owner

TLDR;

  • New ICD-10 API + CLI microservice with semantic search via pgvector/embeddings
  • Docker Compose stack for running all healthcare samples together
  • CI: added ICD-10 and Docker build jobs, Postgres service containers for gatekeeper/samples jobs
  • Clinical and Scheduling queries migrated from raw SQL to LQL
  • Claude Code skills added for common dev workflows
  • Dashboard: new ICD-10 clinical coding UI page

Brief Details

ICD-10 microservice (Samples/ICD10/): Full ICD-10-AM/ACHI code lookup API backed by PostgreSQL with pgvector. Includes a Python embedding service (embedding-service/) using MedEmbed for semantic RAG search, an import pipeline (scripts/CreateDb/) to seed codes and generate embeddings, and an interactive CLI (ICD10.Cli/).

Docker stack (Samples/docker/): docker-compose.yml + Dockerfile.app/Dockerfile.dashboard + nginx.conf for running Clinical, Scheduling, ICD-10, Gatekeeper, Embedding Service, and Dashboard behind a single compose stack. Scripts reorganised under Samples/scripts/.

CI (.github/workflows/ci.yml): Added icd10-tests job (pgvector Postgres + embedded Docker embedding service); docker-build job (validates app and dashboard container builds); Postgres service containers added to gatekeeper-tests and sample-api-tests; Dashboard.Web.Tests removed from sample matrix (replaced by Dashboard.Integration.Tests).

LQL migrations: Clinical and Scheduling .sql queries replaced with equivalent .lql files; filter_like operator added to LQL grammar and tested across SQLite, Postgres, and SQL Server.

Dashboard (Dashboard.Web/): ClinicalCodingPage.cs (1500+ lines) adds ICD-10 code browsing/search UI; ApiClient.cs centralises HTTP calls; new Icons.cs and CSS components.

Tooling: 7 Claude Code skills added under .claude/skills/; tasks.json reorganised with labelled groups; CLAUDE.md updated with clearer rules.

How Do The Tests Prove This Works?

Samples/ICD10/ICD10.Api.Tests/ (new, ~2300 LOC):

  • HealthEndpointTests – asserts the /health endpoint returns 200 and a healthy status, confirming the API boots and DB migrations ran.
  • ChapterEndpointTests / ChapterCategoryTests – seed specific chapters/blocks/categories, then assert the hierarchy endpoints return the correct counts and codes, proving the schema and queries are correct.
  • CodeLookupTests – look up codes by exact code string, assert properties like description and billability; verifies GetCodeByCode.lql and the generated extension methods.
  • SearchEndpointTests – full-text and semantic search against seeded codes; asserts result ordering and relevance, exercising the pgvector cosine-similarity path in SearchIcd10Codes.sql.
  • AchiEndpointTests – same pattern for ACHI surgical codes including block/chapter hierarchy.

Samples/ICD10/ICD10.Cli.Tests/CliE2ETests.cs (new, ~1170 LOC):

  • Launches the real CLI against a live test database and sends commands; asserts parsed output matches expected codes. Proves the CLI correctly calls the API and formats responses.

Samples/Dashboard/Dashboard.Integration.Tests/Icd10E2ETests.cs (new, ~588 LOC):

  • Playwright E2E test that opens the Dashboard, navigates to Clinical Coding, searches for ICD-10 codes, and asserts expected codes appear in the UI. Proves the full front-to-back flow: Dashboard → ICD10 API → Postgres.

Lql/Lql.Tests/LqlFileBasedTests.BasicOperations.cs:

  • filter_like test case added with expected SQL output for all three DB targets (SQLite/filter_like.sql, PostgreSql/filter_like.sql, SqlServer/filter_like.sql), proving the new grammar rule transpiles correctly and platform-independently.

claude and others added 30 commits January 30, 2026 08:03
- Create Samples/ICD10AM folder structure for new microservice
- Add comprehensive SPEC.md with:
  - RAG search feature using MedEmbed-Large-v1 medical embedding model
  - Basic lookup with JSON and FHIR response formats
  - Mermaid ER diagram for database schema
  - API endpoint documentation
  - PostgreSQL with pgvector for vector similarity search
  - RLS (Row Level Security) via user impersonation
- Add icd10am-schema.yaml with DataProvider YAML migrations for:
  - ICD-10-AM chapters, blocks, categories, and codes
  - ACHI procedure blocks and codes
  - Embedding tables for vector storage
  - Coding standards and user search history
- Add Python import script (import_icd10am.py) to:
  - Parse IHACPA data files
  - Generate medical embeddings with MedEmbed
  - Bulk import into PostgreSQL
- Remove "Too Many Cooks" multi-agent section from CLAUDE.md

https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
- Replace all SQL query files with LQL equivalents
- Add ICD10AM.Api.csproj with LQL transpilation support
- Add DataProvider.json configuration
- Add DatabaseSetup.cs and GlobalUsings.cs
- All queries now use pipeline syntax: filter, join, select, order_by

https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
- Hierarchical browse: chapters, blocks, categories, codes
- Code lookup with JSON and FHIR format support
- ACHI procedure endpoints
- RAG search with embedding service integration
- Cosine similarity ranking for semantic search
- LQL transpilation enabled in csproj
- Updated DataProvider.json to use .generated.sql files

https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
- Dockerfile using MedEmbed-Small-v0.1 (384 dims, ~500MB)
- FastAPI service with /embed and /embed/batch endpoints
- docker-compose.yml for easy deployment
- Health checks and resource limits configured
- Model downloaded at build time for fast startup

https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
- ICD10AMApiFactory with seeded test data
- ChapterEndpointTests: hierarchical browse tests
- CodeLookupTests: code search and FHIR format tests
- AchiEndpointTests: procedure code tests
- HealthEndpointTests: health check tests
- Real database, zero mocking

https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
- generate_sample_data.py creates test SQLite database
- Includes common ICD-10-AM codes (infectious, diabetes, cardiac, respiratory, etc.)
- Includes sample ACHI procedures (angiography, appendicectomy, hip replacement, etc.)
- Note: Full ICD-10-AM data requires licensing from IHACPA

https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
- Update SPEC.md with IHACPA licensing requirements
- Add .gitignore for generated files, databases, Python cache
- Note: Full ICD-10-AM data requires purchase from IHACPA

https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
BREAKING: No more licensed IHACPA data!

- Add import_icd10cm.py that downloads FREE data from CMS.gov
- Successfully imports 71,704 diagnosis codes
- 19 chapters, 1,910 blocks, 1,910 categories
- Update SPEC.md to document free data sources
- Remove licensing requirements (CMS data is public domain)

Data sources:
- Primary: https://www.cms.gov/medicare/coding-billing/icd-10-codes
- Mirror: GitHub JSON gist for faster downloads

https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
- Fixed syntax error in try/except blocks around IHACPA download
- Added CDC ICD-10-CM as fallback when IHACPA returns 503 errors
- Uses free US Government CDC data (74,260 codes) which shares
  WHO ICD-10 base with Australian ICD-10-AM
- Script now successfully imports codes when IHACPA is unavailable

https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
- Renamed folder from ICD10AM to ICD10CM (honest about data source)
- Simplified import script - CDC data only, no fallback bullshit
- 74,260 ICD-10-CM codes from CDC (public domain)
- Clean database schema with icd10cm_ table prefix

https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
- Fix DI registration for Func<HttpClient> embedding service
- Fix DatabaseSetup to skip initialization if tables exist (for tests)
- Remove unsupported 'unique' property from schema indexes
- Remove SearchCodes/SearchAchiCodes LQL files (LIKE not supported)
- Implement manual SQL search endpoints in Program.cs
- Disable AOT in LqlCli.SQLite to avoid missing ILCompiler packages
- Update GlobalUsings to remove unused Search result types
- Disable NuGet audit in Directory.Build.props for proxy issues

https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
- generate_embeddings.py: Populates icd10cm_code_embedding table
  using MedEmbed-small-v0.1 model (384 dimensions)
- embedding_service.py: Runtime service for encoding user queries
- SPEC.md: Document the 3-step setup process:
  1. Import codes (import_icd10cm.py)
  2. Generate embeddings (generate_embeddings.py)
  3. Start embedding service (embedding_service.py)

https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
Updated LQL query to reference icd10cm_code_embedding and icd10cm_code
tables where the 74,260 embeddings are stored. Added icd10cm_code and
icd10cm_code_embedding table definitions to schema and DataProvider.json.

30 E2E tests passing, RAG semantic search working with MedEmbed model.

https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
Replaced Python embedding service with native C# ONNX Runtime:
- Added Microsoft.ML.OnnxRuntime and BERTTokenizers NuGet packages
- EncodeWithOnnx helper performs tokenization and mean pooling
- Updated SPEC.md with download instructions for ONNX model
- Model (127MB) excluded from git - download with optimum-cli

30 E2E tests passing, RAG search works without any Python dependency.

https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
The BERTTokenizers library requires vocabulary files in a Vocabularies
directory. These files are needed for tokenizing query text before
encoding with the ONNX model.

https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
Documents how to:
- Setup database and generate embeddings (one-time Python)
- Export ONNX model for C# runtime
- Run the API
- Run E2E tests
- Troubleshoot common issues

https://claude.ai/code/session_01Pxhuxx7gqh8RrTCEGJ2QU4
MelbourneDeveloper and others added 30 commits February 5, 2026 21:58
- Remove XML doc comments from top-level static functions (CS1587)
- Rewrite await using var as await using (...) {} to support ConfigureAwait(false) (CA2007)
- Update TargetFramework net9.0 -> net10.0 in Directory.Build.props and all project files
- Update package versions to 10.x equivalents (Microsoft.Data.Sqlite, Microsoft.AspNetCore.*, Microsoft.Extensions.Logging.Abstractions, System.Text.Json)
- Remove System.Text.Json and System.Data.Common explicit references (now inbox in .NET 10)
- Update CI DOTNET_VERSION to 10.0.x

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…atibility

The h5-compiler 24.x tool targets net9.0 which is not installed locally.
Updated to 26.3.64893 (net10.0) in both root and Dashboard.Web local manifests.
Updated h5.Target SDK and h5/h5.Core package references to 26.x accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants