Skip to content

Add percentile measures (P25, P75, P95, P99) #113

@stumelius

Description

@stumelius

Overview

Currently only the median (P50) is computed. Extending this to P25, P75, P95, and P99 would give full box-plot coverage and make outlier detection practical without leaving dbt-profiler.

Motivation

  • P25/P75 (IQR) are the standard for identifying outliers and understanding distribution shape
  • P95/P99 are essential for SLA/latency-style columns and catching extreme values
  • Users doing data quality monitoring almost always want these alongside mean and stddev

Implementation notes

The per-adapter measure_median infrastructure is already in place. Each adapter override would need additional percentile calls:

Adapter Approach
Default (Postgres, Redshift) percentile_cont(0.25) within group (order by col)
BigQuery approx_quantiles(col, 100)[offset(25)]
SQL Server Extend the existing ROW_NUMBER() approach
Snowflake percentile_cont(0.25) within group (order by col)
Databricks/Athena percentile_approx(col, 0.25)

API design

New optional measures (excluded via exclude_measures like existing ones):

{{ dbt_profiler.get_profile(relation=ref('my_model'), exclude_measures=['p25', 'p75', 'p95', 'p99']) }}

Or grouped under a single toggle (e.g. exclude_measures=['percentiles']) to avoid cluttering the interface — worth discussing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions