This project implements a small SQL-based data warehouse pipeline for CRM subscription data.
The solution separates raw operational data from the analytical reporting layer and includes explicit data quality checks.
The pipeline performs the following steps:
- Load raw CRM data into source tables.
- Detect and log data quality issues such as duplicate transactions, missing references, and invalid subscription dates.
- Transform valid records into a structured DWH schema using dimension and fact tables.
- Build analytical views for BI reporting.
The reporting layer provides:
- Subscription sales performance
- Monthly Recurring Revenue (MRR)
- Cumulative customer Lifetime Value (LTV)
Annual subscriptions are distributed across months for MRR calculations, while LTV is computed from successful customer payments.
docker compose up -dAll SQL scripts from the init directory are executed automatically when PostgreSQL initializes.