ETL Pipeline - Medical Data Processing

Published Power BI Link: https://app.powerbi.com/groups/me/reports/49e81a4e-d445-4f48-b539-e16644e5f70c/8114ca2b8762d315d2b3?experience=power-bi

Production-grade ETL pipeline for processing medical/vital signs data from S3 to AWS RDS PostgreSQL.

Pip install commands (run these first)

pip install python-dotenv
pip install boto3
pip install pandas
pip install sqlalchemy
pip install psycopg2-binary

Or install from requirements.txt:

pip install -r requirements.txt

Setup

Install Dependencies (see pip commands above)
Configure Environment Variables
- Copy .env.template to .env
- Fill in your AWS and RDS configuration:
```
cp .env.template .env
```
- Edit .env with your actual values:
  - RDS_SECRET_NAME: Your AWS Secrets Manager secret name
  - S3_BUCKET_NAME: Your S3 bucket name
  - RDS_WRITER_ENDPOINT: Your RDS cluster writer endpoint
  - AWS_REGION: Your AWS region
AWS Credentials
- Configure AWS credentials via AWS CLI: aws configure
- Or set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY in .env

Usage

Run the ETL pipeline:

python etl_process.py

Pipeline Stages

Extract: Downloads messy_health_data.csv from S3
Transform:
- Removes duplicate rows
- Fills missing Heart Rate and Oxygen Sat (SpO2) with median
- Removes sensor-error outliers (Heart Rate > 200 or < 30)
- Converts Timestamp column to proper datetime format
Load: Uploads cleaned data to RDS table clinical_vitals

Requirements

Python 3.8+
AWS account with S3, Secrets Manager, and RDS access
PostgreSQL database on AWS RDS

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.env		.env
.env.template		.env.template
README.md		README.md
etl_process.py		etl_process.py
jumble.py		jumble.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ETL Pipeline - Medical Data Processing

Pip install commands (run these first)

Setup

Usage

Pipeline Stages

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ETL Pipeline - Medical Data Processing

Pip install commands (run these first)

Setup

Usage

Pipeline Stages

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages