forked from sunlabuiuc/PyHealth
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Current State
PyHealth currently depends on both pandas and polars:
# requirements.txt
pandas>=1.3.2
polars
Questions to Investigate
-
Where is each library used?
- Which modules/functions use pandas?
- Which modules/functions use polars?
- Is there overlap or duplication?
-
Can we consolidate?
- Is pandas only used for legacy compatibility?
- Can existing pandas usage be migrated to polars?
- What would be the breaking changes?
-
Performance implications
- Polars is generally faster for large datasets
- What performance gains could we expect?
- Are there any cases where pandas is still preferable?
-
Maintenance burden
- Supporting both libraries increases maintenance
- Recent pandas version constraints caused installation issues (PyHealth installation fails on halo-pr-528 branch: build wheel error #2)
- Would polars-only simplify the codebase?
Potential Actions
- Audit codebase for pandas usage
- Audit codebase for polars usage
- Benchmark performance differences on typical PyHealth workloads
- Create migration plan if consolidation makes sense
- Document decision and rationale
Context
This issue arose while fixing #2, where pandas version constraints (<2) caused Python 3.12 installation failures. Having both libraries suggests incomplete migration or unclear strategy.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request