`R/gather_draws.R`

, `R/spread_draws.R`

`spread_draws.Rd`

Extract draws from a Bayesian model for one or more variables (possibly with named dimensions) into one of two types of long-format data frames.

```
gather_draws(
model,
...,
regex = FALSE,
sep = "[, ]",
ndraws = NULL,
seed = NULL,
n
)
spread_draws(
model,
...,
regex = FALSE,
sep = "[, ]",
ndraws = NULL,
seed = NULL,
n
)
```

- model
A supported Bayesian model fit. Tidybayes supports a variety of model objects; for a full list of supported models, see tidybayes-models.

- ...
Expressions in the form of

`variable_name[dimension_1, dimension_2, ...] | wide_dimension`

. See*Details*.- regex
If

`TRUE`

, variable names are treated as regular expressions and all column matching the regular expression and number of dimensions are included in the output. Default`FALSE`

.- sep
Separator used to separate dimensions in variable names, as a regular expression.

- ndraws
The number of draws to return, or

`NULL`

to return all draws.- seed
A seed to use when subsampling draws (i.e. when

`ndraws`

is not`NULL`

).- n
(Deprecated). Use

`ndraws`

.

A data frame.

Imagine a JAGS or Stan fit named `model`

. The model may contain a variable named
`b[i,v]`

(in the JAGS or Stan language) with dimension `i`

in `1:100`

and
dimension `v`

in `1:3`

. However, the default format for draws returned from
JAGS or Stan in R will not reflect this indexing structure, instead
they will have multiple columns with names like `"b[1,1]"`

, `"b[2,1]"`

, etc.

`spread_draws`

and `gather_draws`

provide a straightforward
syntax to translate these columns back into properly-indexed variables in two different
tidy data frame formats, optionally recovering dimension types (e.g. factor levels) as it does so.

`spread_draws`

and `gather_draws`

return data frames already grouped by
all dimensions used on the variables you specify.

The difference between `spread_draws`

is that names of variables in the model will
be spread across the data frame as column names, whereas `gather_draws`

will
gather variables into a single column named `".variable"`

and place values of variables into a
column named `".value"`

. To use naming schemes from other packages (such as `broom`

), consider passing
results through functions like `to_broom_names()`

or `to_ggmcmc_names()`

.

For example, `spread_draws(model, a[i], b[i,v])`

might return a grouped
data frame (grouped by `i`

and `v`

), with:

column

`".chain"`

: the chain number.`NA`

if not applicable to the model type; this is typically only applicable to MCMC algorithms.column

`".iteration"`

: the iteration number. Guaranteed to be unique within-chain only.`NA`

if not applicable to the model type; this is typically only applicable to MCMC algorithms.column

`".draw"`

: a unique number for each draw from the posterior. Order is not guaranteed to be meaningful.column

`"i"`

: value in`1:5`

column

`"v"`

: value in`1:10`

column

`"a"`

: value of`"a[i]"`

for draw`".draw"`

column

`"b"`

: value of`"b[i,v]"`

for draw`".draw"`

`gather_draws(model, a[i], b[i,v])`

on the same model would return a grouped
data frame (grouped by `i`

and `v`

), with:

column

`".chain"`

: the chain numbercolumn

`".iteration"`

: the iteration numbercolumn

`".draw"`

: the draw numbercolumn

`"i"`

: value in`1:5`

column

`"v"`

: value in`1:10`

, or`NA`

if`".variable"`

is`"a"`

.column

`".variable"`

: value in`c("a", "b")`

.column

`".value"`

: value of`"a[i]"`

(when`".variable"`

is`"a"`

) or`"b[i,v]"`

(when`".variable"`

is`"b"`

) for draw`".draw"`

`spread_draws`

and `gather_draws`

can use type information
applied to the `model`

object by `recover_types()`

to convert columns
back into their original types. This is particularly helpful if some of the dimensions in
your model were originally factors. For example, if the `v`

dimension
in the original data frame `data`

was a factor with levels `c("a","b","c")`

,
then we could use `recover_types`

before `spread_draws`

:

```
model %>%
recover_types(data)
spread_draws(model, b[i,v])
```

Which would return the same data frame as above, except the `"v"`

column
would be a value in `c("a","b","c")`

instead of `1:3`

.

For variables that do not share the same subscripts (or share
some but not all subscripts), we can supply their specifications separately.
For example, if we have a variable `d[i]`

with the same `i`

subscript
as `b[i,v]`

, and a variable `x`

with no subscripts, we could do this:

`spread_draws(model, x, d[i], b[i,v])`

Which is roughly equivalent to this:

```
spread_draws(model, x) %>%
inner_join(spread_draws(model, d[i])) %>%
inner_join(spread_draws(model, b[i,v])) %>%
group_by(i,v)
```

Similarly, this:

`gather_draws(model, x, d[i], b[i,v])`

Is roughly equivalent to this:

```
bind_rows(
gather_draws(model, x),
gather_draws(model, d[i]),
gather_draws(model, b[i,v])
)
```

The `c`

and `cbind`

functions can be used to combine multiple variable names that have
the same dimensions. For example, if we have several variables with the same
subscripts `i`

and `v`

, we could do either of these:

`spread_draws(model, c(w, x, y, z)[i,v])`

`spread_draws(model, cbind(w, x, y, z)[i,v]) # equivalent`

Each of which is roughly equivalent to this:

`spread_draws(model, w[i,v], x[i,v], y[i,v], z[i,v])`

Besides being more compact, the `c()`

-style syntax is currently also
faster (though that may change).

Dimensions can be omitted from the resulting data frame by leaving their names
blank; e.g. `spread_draws(model, b[,v])`

will omit the first dimension of
`b`

from the output. This is useful if a dimension is known to contain all
the same value in a given model.

The shorthand `..`

can be used to specify one column that should be put
into a wide format and whose names will be the base variable name, plus a dot
("."), plus the value of the dimension at `..`

. For example:

`spread_draws(model, b[i,..])`

would return a grouped data frame
(grouped by `i`

), with:

column

`".chain"`

: the chain numbercolumn

`".iteration"`

: the iteration numbercolumn

`".draw"`

: the draw numbercolumn

`"i"`

: value in`1:20`

column

`"b.1"`

: value of`"b[i,1]"`

for draw`".draw"`

column

`"b.2"`

: value of`"b[i,2]"`

for draw`".draw"`

column

`"b.3"`

: value of`"b[i,3]"`

for draw`".draw"`

An optional clause in the form `| wide_dimension`

can also be used to put
the data frame into a wide format based on `wide_dimension`

. For example, this:

`spread_draws(model, b[i,v] | v)`

is roughly equivalent to this:

`spread_draws(model, b[i,v]) %>% spread(v,b)`

The main difference between using the `|`

syntax instead of the
`..`

syntax is that the `|`

syntax respects prototypes applied to
dimensions with `recover_types()`

, and thus can be used to get
columns with nicer names. For example:

`%>% recover_types(data) %>% spread_draws(b[i,v] | v) model `

would return a grouped data frame
(grouped by `i`

), with:

column

`".chain"`

: the chain numbercolumn

`".iteration"`

: the iteration numbercolumn

`".draw"`

: the draw numbercolumn

`"i"`

: value in`1:20`

column

`"a"`

: value of`"b[i,1]"`

for draw`".draw"`

column

`"b"`

: value of`"b[i,2]"`

for draw`".draw"`

column

`"c"`

: value of`"b[i,3]"`

for draw`".draw"`

The shorthand `.`

can be used to specify columns that should be nested
into vectors, matrices, or n-dimensional arrays (depending on how many dimensions
are specified with `.`

).

For example, `spread_draws(model, a[.], b[.,.])`

might return a
data frame, with:

column

`".chain"`

: the chain number.column

`".iteration"`

: the iteration number.column

`".draw"`

: a unique number for each draw from the posterior.column

`"a"`

: a list column of vectors.column

`"b"`

: a list column of matrices.

Ragged arrays are turned into non-ragged arrays with
missing entries given the value `NA`

.

Finally, variable names can be regular expressions by setting `regex = TRUE`

; e.g.:

`spread_draws(model, `b_.*`[i], regex = TRUE)`

Would return a tidy data frame with variables starting with `b_`

and having one dimension.

```
library(dplyr)
library(ggplot2)
data(RankCorr, package = "ggdist")
RankCorr %>%
spread_draws(b[i, j])
#> # A tibble: 12,000 × 6
#> # Groups: i, j [12]
#> i j b .chain .iteration .draw
#> <int> <int> <dbl> <int> <int> <int>
#> 1 1 1 -0.927 1 1 1
#> 2 1 1 -0.979 1 2 2
#> 3 1 1 -1.15 1 3 3
#> 4 1 1 -1.09 1 4 4
#> 5 1 1 -1.20 1 5 5
#> 6 1 1 -1.07 1 6 6
#> 7 1 1 -1.11 1 7 7
#> 8 1 1 -1.06 1 8 8
#> 9 1 1 -0.831 1 9 9
#> 10 1 1 -0.986 1 10 10
#> # … with 11,990 more rows
RankCorr %>%
spread_draws(b[i, j], tau[i], u_tau[i])
#> # A tibble: 12,000 × 8
#> # Groups: i, j [12]
#> i j b .chain .iteration .draw tau u_tau
#> <int> <int> <dbl> <int> <int> <int> <dbl> <dbl>
#> 1 1 1 -0.927 1 1 1 5.79 5.87
#> 2 1 1 -0.979 1 2 2 6.26 4.91
#> 3 1 1 -1.15 1 3 3 7.38 3.34
#> 4 1 1 -1.09 1 4 4 5.97 6.96
#> 5 1 1 -1.20 1 5 5 6.01 5.30
#> 6 1 1 -1.07 1 6 6 7.03 5.34
#> 7 1 1 -1.11 1 7 7 7.39 5.53
#> 8 1 1 -1.06 1 8 8 5.98 6.79
#> 9 1 1 -0.831 1 9 9 6.75 5.40
#> 10 1 1 -0.986 1 10 10 6.76 6.63
#> # … with 11,990 more rows
RankCorr %>%
gather_draws(b[i, j], tau[i], u_tau[i])
#> # A tibble: 18,000 × 7
#> # Groups: i, j, .variable [18]
#> i j .chain .iteration .draw .variable .value
#> <int> <int> <int> <int> <int> <chr> <dbl>
#> 1 1 1 1 1 1 b -0.927
#> 2 1 1 1 2 2 b -0.979
#> 3 1 1 1 3 3 b -1.15
#> 4 1 1 1 4 4 b -1.09
#> 5 1 1 1 5 5 b -1.20
#> 6 1 1 1 6 6 b -1.07
#> 7 1 1 1 7 7 b -1.11
#> 8 1 1 1 8 8 b -1.06
#> 9 1 1 1 9 9 b -0.831
#> 10 1 1 1 10 10 b -0.986
#> # … with 17,990 more rows
RankCorr %>%
gather_draws(tau[i], typical_r) %>%
median_qi()
#> # A tibble: 4 × 8
#> i .variable .value .lower .upper .width .point .interval
#> <int> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 1 tau 6.03 5.03 7.11 0.95 median qi
#> 2 2 tau 3.30 2.41 4.46 0.95 median qi
#> 3 3 tau 3.65 2.73 4.72 0.95 median qi
#> 4 NA typical_r 0.548 0.309 0.778 0.95 median qi
```