Skip to content

Add cdf and quantile methods for UnivariateKDE #139

@oskarhs

Description

@oskarhs

I propose adding methods for evaluating the cdf and the quantiles of a UnivariateKDE object.

Background

Kernel estimators are the go-to class of estimators for nonparametric density estimation in one dimension, owing to their speed and good practical performance. Not only would a cdf method based on the result of an interpolated kde object be of independent interest for visualization purposes, but being able to evaluate the cdf is a necessity for some statistical models such as those based on copulas.

In the world of copulas, dependence structure and marginal distributions are modeled separately, and kernel estimators are commonly employed when estimating the required univariate marginals. However, estimation of the copula itself requires evaluation of the resulting cdf estimate.

Proposed changes

Adding convenience functions for computing the cdf and the quantile function. In particular, my proposed solution is to construct both the cdf and the quantile function via linear interpolation, e.g. as in the spatstat R package.

A basic implementation would be something akin to

using KernelDensity, Interpolations
x = randn(10^4)
k = kde(x)

# First compute Fhat on the midpoints grid, then linearly interpolate the result
dt = step(k.x)
F_grid = Vector{Float64}(undef, length(k.x))
F_grid[1] = 0.0
for j = 2:length(k.x) # compute Fhat at grid via the trapezoid rule
    F_grid[j] = F_grid[j-1] + 0.5 * (k.density[j] + k.density[j-1]) * dt
end
F_grid = F_grid / F_grid[end] # normalize so that F is a proper cdf
F_interp = linear_interpolation(k.x, F_grid, extrapolation_bc=Flat()) # the estimated cdf

The approach to computing the quantile function is similar. This would not add any additional dependencies as KernelDensity already depends on Distributions and Interpolations.

Other approaches

There are other ways of constructing estimates of the cdf based on tabulated data. However, I find the linear interpolation method the most appealing due to the inherent symmetry between linear interpolation of the cdf and of the quantile function, i.e. interpolation and the generalized inverse commutes for linear interpolation, which will not necessarily be the case for higher-order interpolants.

Implementation

If there is interest in including these methods in KernelDensity.jl, I'd be happy to create a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions