pydhis2 is a next-generation Python library for interacting with DHIS2, the world's largest health information management system. It provides a clean, modern, and efficient API for data extraction, analysis, and management, with a strong emphasis on creating reproducible scientific workflowsβa critical need in public health research and data analysis, especially in Low and Middle-Income Country (LMIC) contexts.
Target Audience:
- Public health researchers and data scientists
- DHIS2 implementers and administrators
- Data analysts working with health information systems
- Academic researchers requiring reproducible data pipelines
Scientific Use Cases:
- Epidemiological surveillance and analysis
- Health system performance monitoring
- Data quality assessments and validation
- Routine health data analytics
- Integration with statistical computing environments (R, Python, Julia)
- π Modern & Asynchronous: Built with
asynciofor high-performance, non-blocking I/O, making it ideal for large-scale data operations. A synchronous client is also provided for simplicity in smaller scripts. - π¬ Reproducible by Design: From project templates to a powerful CLI,
pydhis2is built to support standardized, shareable, and verifiable data analysis pipelinesβessential for scientific research. - πΌ Seamless DataFrame Integration: Natively convert DHIS2 analytics data into Pandas DataFrames with a single method call (
.to_pandas()), connecting you instantly to the PyData ecosystem. - π§ Powerful Command Line Interface: Automate common tasks like data pulling and configuration directly from your terminal.
Stable Release (Recommended)
Install pydhis2 directly from PyPI:
pip install pydhis2Development Installation
For contributing or accessing the latest features:
git clone https://github.com/HzaCode/pyDHIS2.git
cd pyDHIS2
pip install -e ".[dev]"See our Contributing Guide for more details on development setup.
Use the built-in CLI to run a quick demo. This will connect to a live DHIS2 server, fetch data, and confirm that your installation is working correctly.
# Check the installed version
pydhis2 version
# Run the quick demo
pydhis2 demo quickA successful run will produce the following output:
============================================================
pydhis2 Quick Demo
============================================================
=== Testing: https://demos.dhis2.org/dq ===
Found working API endpoint!
System: Data Quality
Version: 2.38.4.3
Found working server: https://demos.dhis2.org/dq
2. Querying Analytics data...
Retrieved 1 data records
...
Demo completed successfully!
Here is a simple example of how to use pydhis2 in a Python script to fetch analytics data and load it into a Pandas DataFrame.
Create a file named my_analysis.py:
import asyncio
import sys
from pydhis2 import get_client, DHIS2Config
from pydhis2.core.types import AnalyticsQuery
# pydhis2 provides both an async and a sync client
AsyncDHIS2Client, _ = get_client()
async def main():
# 1. Configure the connection to a DHIS2 server
config = DHIS2Config(
base_url="https://demos.dhis2.org/dq",
auth=("demo", "District1#")
)
async with AsyncDHIS2Client(config) as client:
# 2. Define the query parameters
query = AnalyticsQuery(
dx=["b6mCG9sphIT"], # Data element: ANC 1 Outlier Threshold
ou="qzGX4XdWufs", # Org unit: A-1 District Hospital
pe="2023" # Period: Year 2023
)
# 3. Fetch data and convert it directly to a Pandas DataFrame
df = await client.analytics.to_pandas(query)
# 4. Analyze and display the results
print("β
Data fetched successfully!")
print(f"Retrieved {len(df)} records.")
print("\n--- Data Preview ---")
print(df.head())
if __name__ == "__main__":
# Standard fix for asyncio on Windows
if sys.platform == 'win32':
asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
asyncio.run(main())Run your script from the terminal:
python my_analysis.pyWhile you can pass credentials directly in your script, we recommend using environment variables for better security and flexibility.
1. Environment Variables (Recommended)
export DHIS2_URL="https://your-dhis2-server.com"
export DHIS2_USERNAME="your_username"
export DHIS2_PASSWORD="your_password"pydhis2 will automatically detect and use these variables.
2. In-Script Configuration
from pydhis2 import DHIS2Config
config = DHIS2Config(
base_url="https://your-dhis2-server.com",
auth=("your_username", "your_password")
)3. Using the CLI The CLI provides a convenient way to set and cache your credentials.
pydhis2 config --url "https://your-dhis2-server.com" --username "your_username"Beyond being a library, pydhis2 promotes a standardized workflow that is essential for scientific research. To jumpstart your analysis, we provide a project template powered by Cookiecutter.
Why use the template?
- Standardization: Ensures every project starts with a clean, logical structure.
- Rapid Start: Generate a fully functional project skeleton in a single command.
- Best Practices: Includes pre-configured settings for DHIS2 connections, data quality pipelines, and environment management.
- Focus on Analysis: Spend less time on boilerplate setup and more time on your research.
-
Install Cookiecutter:
pip install cookiecutter
-
Generate your project: Point Cookiecutter to the
pydhis2template. It will prompt you for project details.cookiecutter gh:HzaCode/pyDHIS2 --directory pydhis2/templates
You'll be prompted for details like your project name and author:
project_name [My DHIS-2 Analysis Project]: Malaria Analysis Malawi project_slug [malaria_analysis_malawi]: author_name [Your Name]: Dr. Evans -
Get a complete, ready-to-use project structure:
malaria-analysis-malawi/ βββ configs/ # DHIS-2 & DQR configurations βββ data/ # Raw and processed data βββ pipelines/ # Analysis pipeline definitions βββ scripts/ # Runner scripts βββ .env.example # Environment variable template βββ README.md # A dedicated README for your new project
You can now cd into your new project directory and begin your analysis immediately!
pydhis2 provides a powerful CLI for common data operations. (Note: Implementation is in progress)
# Pull analytics data and save as Parquet
pydhis2 analytics pull --dx "b6mCG9sphIT" --ou "qzGX4XdWufs" --pe "2023" --out analytics.parquet
# Pull tracker events
pydhis2 tracker events --program "program_id" --out events.parquet
# Run a data quality review
pydhis2 dqr analyze --input analytics.parquet --html dqr_report.htmlFor a full list of commands, run pydhis2 --help.
| Endpoint | Read | Write | DataFrame | Pagination | Streaming |
|---|---|---|---|---|---|
| Analytics | β | - | β | β | β |
| DataValueSets | β | β | β | β | β |
| Tracker Events | β | β | β | β | β |
| Metadata | β | β | β | - | - |
- Python: β₯ 3.9
- DHIS2: β₯ 2.36
- Platforms: Windows, Linux, macOS
Contributions are welcome and highly encouraged! pydhis2 is a community-driven project.
Please see our Contributing Guide for details on how to get started. Also, be sure to review our Code of Conduct.
- π Documentation: For in-depth guides and API references.
- π GitHub Issues: To report bugs or request new features.
- π¬ GitHub Discussions: For questions, ideas, and community conversation.
- π Changelog: Version history and release notes.
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
