A variation of quantile()
that can be applied to weighted samples.
Usage
weighted_quantile(
x,
probs = seq(0, 1, 0.25),
weights = NULL,
n = NULL,
na.rm = FALSE,
names = TRUE,
type = 7,
digits = 7
)
weighted_quantile_fun(x, weights = NULL, n = NULL, na.rm = FALSE, type = 7)
Arguments
- x
<numeric> Sample values.
- probs
<numeric> Vector of probabilities in \([0, 1]\) defining the quantiles to return.
- weights
<numeric | NULL> Weights for the sample. One of:
numeric vector of same length as
x
: weights for corresponding values inx
, which will be normalized to sum to 1.NULL
: indicates no weights are provided, so unweighted sample quantiles (equivalent toquantile()
) are returned.
- n
<scalar numeric> Presumed effective sample size. If this is greater than 1 and continuous quantiles (
type >= 4
) are requested, flat regions may be added to the approximation to the inverse CDF in areas where the normalized weight exceeds1/n
(i.e., regions of high density). This can be used to ensure that if a sample of sizen
with duplicatex
values is summarized into a weighted sample without duplicates, the result ofweighted_quantile(..., n = n)
on the weighted sample is equal to the result ofquantile()
on the original sample. One of:NULL
: do not make a sample size adjustment.numeric: presumed effective sample size.
function or name of function (as a string): A function applied to
weights
(prior to normalization) to determine the sample size. Some useful values may be:"length"
: i.e. use the number of elements inweights
(equivalently inx
) as the effective sample size."sum"
: i.e. use the sum of the unnormalizedweights
as the sample size. Useful if the providedweights
is unnormalized so that its sum represents the true sample size.
- na.rm
<scalar logical> If
TRUE
, corresponding entries inx
andweights
are removed if either isNA
.- names
<scalar logical> If
TRUE
, add names to the output giving the inputprobs
formatted as a percentage.- type
<scalar integer> Value between 1 and 9: determines the type of quantile estimator to be used. Types 1 to 3 are for discontinuous quantiles, types 4 to 9 are for continuous quantiles. See Details.
- digits
<scalar numeric> The number of digits to use to format percentages when
names
isTRUE
.
Value
weighted_quantile()
returns a numeric vector of length(probs)
with the
estimate of the corresponding quantile from probs
.
weighted_quantile_fun()
returns a function that takes a single argument,
a vector of probabilities, which itself returns the corresponding quantile
estimates. It may be useful when weighted_quantile()
needs to be called
repeatedly for the same sample, re-using some pre-computation.
Details
Calculates weighted quantiles using a variation of the quantile types based
on a generalization of quantile()
.
Type 1–3 (discontinuous) quantiles are directly a function of the inverse CDF as a step function, and so can be directly translated to the weighted case using the natural definition of the weighted ECDF as the cumulative sum of the normalized weights.
Type 4–9 (continuous) quantiles require some translation from the definitions
in quantile()
. quantile()
defines continuous estimators in terms of
\(x_k\), which is the \(k\)th order statistic, and \(p_k\), which is a function of \(k\)
and \(n\) (the sample size). In the weighted case, we instead take \(x_k\) as the \(k\)th
smallest value of \(x\) in the weighted sample (not necessarily an order statistic,
because of the weights). Then we can re-write the formulas for \(p_k\) in terms of
\(F(x_k)\) (the empirical CDF at \(x_k\), i.e. the cumulative sum of normalized
weights) and \(f(x_k)\) (the normalized weight at \(x_k\)), by using the
fact that, in the unweighted case, \(k = F(x_k) \cdot n\) and \(1/n = f(x_k)\):
- Type 4
\(p_k = \frac{k}{n} = F(x_k)\)
- Type 5
\(p_k = \frac{k - 0.5}{n} = F(x_k) - \frac{f(x_k)}{2}\)
- Type 6
\(p_k = \frac{k}{n + 1} = \frac{F(x_k)}{1 + f(x_k)}\)
- Type 7
\(p_k = \frac{k - 1}{n - 1} = \frac{F(x_k) - f(x_k)}{1 - f(x_k)}\)
- Type 8
\(p_k = \frac{k - 1/3}{n + 1/3} = \frac{F(x_k) - f(x_k)/3}{1 + f(x_k)/3}\)
- Type 9
\(p_k = \frac{k - 3/8}{n + 1/4} = \frac{F(x_k) - f(x_k) \cdot 3/8}{1 + f(x_k)/4}\)
Then the quantile function (inverse CDF) is the piece-wise linear function defined by the points \((p_k, x_k)\).