A variation of `quantile()`

that can be applied to weighted samples.

## Usage

```
weighted_quantile(
x,
probs = seq(0, 1, 0.25),
weights = NULL,
n = NULL,
na.rm = FALSE,
names = TRUE,
type = 7,
digits = 7
)
weighted_quantile_fun(x, weights = NULL, n = NULL, na.rm = FALSE, type = 7)
```

## Arguments

- x
numeric vector: sample values

- probs
numeric vector: probabilities in \([0, 1]\)

- weights
Weights for the sample. One of:

numeric vector of same length as

`x`

: weights for corresponding values in`x`

, which will be normalized to sum to 1.`NULL`

: indicates no weights are provided, so unweighted sample quantiles (equivalent to`quantile()`

) are returned.

- n
Presumed effective sample size. If this is greater than 1 and continuous quantiles (

`type >= 4`

) are requested, flat regions may be added to the approximation to the inverse CDF in areas where the normalized weight exceeds`1/n`

(i.e., regions of high density). This can be used to ensure that if a sample of size`n`

with duplicate`x`

values is summarized into a weighted sample without duplicates, the result of`weighted_quantile(..., n = n)`

on the weighted sample is equal to the result of`quantile()`

on the original sample. One of:`NULL`

: do not make a sample size adjustment.numeric: presumed effective sample size.

function or name of function (as a string): A function applied to

`weights`

(prior to normalization) to determine the sample size. Some useful values may be:`"length"`

: i.e. use the number of elements in`weights`

(equivalently in`x`

) as the effective sample size.`"sum"`

: i.e. use the sum of the unnormalized`weights`

as the sample size. Useful if the provided`weights`

is unnormalized so that its sum represents the true sample size.

- na.rm
logical: if

`TRUE`

, corresponding entries in`x`

and`weights`

are removed if either is`NA`

.- names
logical: If

`TRUE`

, add names to the output giving the input`probs`

formatted as a percentage.- type
integer between 1 and 9: determines the type of quantile estimator to be used. Types 1 to 3 are for discontinuous quantiles, types 4 to 9 are for continuous quantiles. See

**Details**.- digits
numeric: the number of digits to use to format percentages when

`names`

is`TRUE`

.

## Value

`weighted_quantile()`

returns a numeric vector of `length(probs)`

with the
estimate of the corresponding quantile from `probs`

.

`weighted_quantile_fun()`

returns a function that takes a single argument,
a vector of probabilities, which itself returns the corresponding quantile
estimates. It may be useful when `weighted_quantile()`

needs to be called
repeatedly for the same sample, re-using some pre-computation.

## Details

Calculates weighted quantiles using a variation of the quantile types based
on a generalization of `quantile()`

.

Type 1--3 (discontinuous) quantiles are directly a function of the inverse CDF as a step function, and so can be directly translated to the weighted case using the natural definition of the weighted ECDF as the cumulative sum of the normalized weights.

Type 4--9 (continuous) quantiles require some translation from the definitions
in `quantile()`

. `quantile()`

defines continuous estimators in terms of
\(x_k\), which is the \(k\)th order statistic, and \(p_k\), which is a function of \(k\)
and \(n\) (the sample size). In the weighted case, we instead take \(x_k\) as the \(k\)th
smallest value of \(x\) in the weighted sample (not necessarily an order statistic,
because of the weights). Then we can re-write the formulas for \(p_k\) in terms of
\(F(x_k)\) (the empirical CDF at \(x_k\), i.e. the cumulative sum of normalized
weights) and \(f(x_k)\) (the normalized weight at \(x_k\)), by using the
fact that, in the unweighted case, \(k = F(x_k) \cdot n\) and \(1/n = f(x_k)\):

- Type 4
\(p_k = \frac{k}{n} = F(x_k)\)

- Type 5
\(p_k = \frac{k - 0.5}{n} = F(x_k) - \frac{f(x_k)}{2}\)

- Type 6
\(p_k = \frac{k}{n + 1} = \frac{F(x_k)}{1 + f(x_k)}\)

- Type 7
\(p_k = \frac{k - 1}{n - 1} = \frac{F(x_k) - f(x_k)}{1 - f(x_k)}\)

- Type 8
\(p_k = \frac{k - 1/3}{n + 1/3} = \frac{F(x_k) - f(x_k)/3}{1 + f(x_k)/3}\)

- Type 9
\(p_k = \frac{k - 3/8}{n + 1/4} = \frac{F(x_k) - f(x_k) \cdot 3/8}{1 + f(x_k)/4}\)

Then the quantile function (inverse CDF) is the piece-wise linear function defined by the points \((p_k, x_k)\).