-
Couldn't load subscription status.
- Fork 42
Description
I propose adding methods for evaluating the cdf and the quantiles of a UnivariateKDE object.
Background
Kernel estimators are the go-to class of estimators for nonparametric density estimation in one dimension, owing to their speed and good practical performance. Not only would a cdf method based on the result of an interpolated kde object be of independent interest for visualization purposes, but being able to evaluate the cdf is a necessity for some statistical models such as those based on copulas.
In the world of copulas, dependence structure and marginal distributions are modeled separately, and kernel estimators are commonly employed when estimating the required univariate marginals. However, estimation of the copula itself requires evaluation of the resulting cdf estimate.
Proposed changes
Adding convenience functions for computing the cdf and the quantile function. In particular, my proposed solution is to construct both the cdf and the quantile function via linear interpolation, e.g. as in the spatstat R package.
A basic implementation would be something akin to
using KernelDensity, Interpolations
x = randn(10^4)
k = kde(x)
# First compute Fhat on the midpoints grid, then linearly interpolate the result
dt = step(k.x)
F_grid = Vector{Float64}(undef, length(k.x))
F_grid[1] = 0.0
for j = 2:length(k.x) # compute Fhat at grid via the trapezoid rule
F_grid[j] = F_grid[j-1] + 0.5 * (k.density[j] + k.density[j-1]) * dt
end
F_grid = F_grid / F_grid[end] # normalize so that F is a proper cdf
F_interp = linear_interpolation(k.x, F_grid, extrapolation_bc=Flat()) # the estimated cdfThe approach to computing the quantile function is similar. This would not add any additional dependencies as KernelDensity already depends on Distributions and Interpolations.
Other approaches
There are other ways of constructing estimates of the cdf based on tabulated data. However, I find the linear interpolation method the most appealing due to the inherent symmetry between linear interpolation of the cdf and of the quantile function, i.e. interpolation and the generalized inverse commutes for linear interpolation, which will not necessarily be the case for higher-order interpolants.
Implementation
If there is interest in including these methods in KernelDensity.jl, I'd be happy to create a PR.