Skip to content

fix: harden cache init and nlp metrics edge cases#875

Open
AhmedAli58 wants to merge 4 commits intosunlabuiuc:masterfrom
AhmedAli58:bench/polars-streaming-loader-benchmark
Open

fix: harden cache init and nlp metrics edge cases#875
AhmedAli58 wants to merge 4 commits intosunlabuiuc:masterfrom
AhmedAli58:bench/polars-streaming-loader-benchmark

Conversation

@AhmedAli58
Copy link

@AhmedAli58 AhmedAli58 commented Feb 25, 2026

Summary

This PR resolves three meaningful quality issues focused on correctness and reliability:

  1. Import-time cache path failure in restricted environments
  • pyhealth could fail to import when ~/.cache/pyhealth was not writable.
  • Added resilient cache initialization with preferred/fallback paths, better error context, and PYHEALTH_BASE_CACHE override support.
  • Made logger handler setup idempotent to avoid duplicate handlers.
  1. Broken missing-dependency handling in NLP scorer
  • Scorer._get_missing_modules() used variable shadowing and invalid accumulation on install-failure paths.
  • Reworked missing-module collection so unavailable methods are removed cleanly and missing requirements are returned consistently.
  1. Empty score serialization crash
  • ScoreSet.as_numpy() raised on empty results due to np.stack([]).
  • Added safe empty-array handling so empty score sets serialize to valid numpy/dataframe outputs.

Files Changed

  • pyhealth/__init__.py
  • pyhealth/nlp/metrics.py
  • tests/core/test_cache_path.py
  • tests/nlp/test_metrics_edge_cases.py

Validation

  • PYHEALTH_BASE_CACHE=/tmp/pyhealth-cache pytest -q tests/core/test_cache_path.py tests/nlp/test_metrics_edge_cases.py
  • python -m py_compile pyhealth/__init__.py pyhealth/nlp/metrics.py tests/core/test_cache_path.py tests/nlp/test_metrics_edge_cases.py

Impact

  • Improves robustness of package import behavior across constrained environments.
  • Strengthens dependency-failure paths in NLP scoring utilities.
  • Prevents crashes for valid empty-input workflows and adds regression coverage for all three fixes.

@AhmedAli58 AhmedAli58 changed the title fix: improve metrics reliability and edge-case handling fix: harden cache init and nlp metrics edge cases Mar 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants