Compare the value of draws of some variable from a Bayesian model for different levels of a factor

Given posterior draws from a Bayesian model in long format (e.g. as returned by spread_draws()), compare the value of a variable in those draws across different paired combinations of levels of a factor.

compare_levels(
  data,
  variable,
  by,
  fun = `-`,
  comparison = "default",
  draw_indices = c(".chain", ".iteration", ".draw"),
  ignore_groups = ".row"
)

Arguments

data: Long-format data.frame of draws such as returned by spread_draws() or gather_draws(). If data is a grouped data frame, comparisons will be made within groups (if one of the groups in the data frame is the by column, that specific group will be ignored, as it is not possible to make comparisons both within some variable and across it simultaneously).
variable: Bare (unquoted) name of a column in data representing the variable to compare across levels. Can be a numeric variable (as in long-data-frame-of-draws format) or a posterior::rvar.
by: Bare (unquoted) name of a column in data that is a factor or ordered. The value of variable will be compared across pairs of levels of this factor.
fun: Binary function to use for comparison. For each pair of levels of by we are comparing (as determined by comparison), compute the result of this function.
comparison: One of (a) the comparison types ordered, control, pairwise, or default (may also be given as strings, e.g. "ordered"), see Details; (b) a user-specified function that takes a factor and returns a list of pairs of names of levels to compare (as strings) and/or unevaluated expressions containing representing the comparisons to make; or (c) a list of pairs of names of levels to compare (as strings) and/or unevaluated expressions representing the comparisons to make, e.g.: list(c("a", "b"), c("b", "c")) or exprs(a - b, b - c), both of which would compare level "a" against "b" and level "b" against "c". Note that the unevaluated expression syntax ignores the fun argument, can include any other functions desired (e.g. variable transformations), and can even include more than two levels or other columns in data. Types (b) and (c) may use named lists, in which case the provided names are used in the output variable column instead converting the unevaluated expression to a string. You can also use emmeans_comparison() to generate a comparison function based on contrast methods from the emmeans package.
draw_indices: Character vector of column names that should be treated as indices of draws. Operations are done within combinations of these values. The default is c(".chain", ".iteration", ".draw"), which is the same names used for chain, iteration, and draw indices returned by tidy_draws(). Names in draw_indices that are not found in the data are ignored.
ignore_groups: character vector of names of groups to ignore by default in the input grouping. This is primarily provided to make it easier to pipe output of add_epred_draws() into this function, as that function provides a ".row" output column that is grouped, but which is virtually never desired to group by when using compare_levels.

Value

A data.frame with the same columns as data, except that the by column contains a symbolic representation of the comparison of pairs of levels of by in data, and variable contains the result of that comparison.

Details

This function simplifies conducting comparisons across levels of some variable in a tidy data frame of draws. It applies fun to all values of variable for each pair of levels of by as selected by comparison. By default, all pairwise comparisons are generated if by is an unordered factor and ordered comparisons are made if by is ordered.

The included comparison types are:

ordered: compare each level i with level i - 1; e.g. fun(i, i - 1)
pairwise: compare each level of by with every other level.
control: compare each level of by with the first level of by. If you wish to compare with a different level, you can first apply relevel() to by to set the control (reference) level.
default: use ordered if is.ordered(by) and pairwise otherwise.

Author

Matthew Kay

Examples


library(dplyr)
library(ggplot2)

data(RankCorr, package = "ggdist")

# Let's do all pairwise comparisons of b[i,1]:
RankCorr %>%
  spread_draws(b[i,j]) %>%
  filter(j == 1) %>%
  compare_levels(b, by = i) %>%
  median_qi()
#> # A tibble: 3 × 8
#>   i         j     b   .lower .upper .width .point .interval
#>   <chr> <int> <dbl>    <dbl>  <dbl>  <dbl> <chr>  <chr>    
#> 1 2 - 1     1 0.343 -0.00937  0.684   0.95 median qi       
#> 2 3 - 1     1 0.734  0.416    1.08    0.95 median qi       
#> 3 3 - 2     1 0.402 -0.0133   0.791   0.95 median qi       

# Or let's plot all comparisons against the first level (control):
RankCorr %>%
  spread_draws(b[i,j]) %>%
  filter(j == 1) %>%
  compare_levels(b, by = i, comparison = control) %>%
  ggplot(aes(x = b, y = i)) +
  stat_halfeye()


# Or let's plot comparisons of all levels of j within
# all levels of i
RankCorr %>%
  spread_draws(b[i,j]) %>%
  group_by(i) %>%
  compare_levels(b, by = j) %>%
  ggplot(aes(x = b, y = j)) +
  stat_halfeye() +
  facet_grid(cols = vars(i))