R/gather_rvars.R
, R/spread_rvars.R
spread_rvars.Rd
Extract draws from a Bayesian model for one or more variables (possibly with named dimensions) into one of two types of long-format data frames of posterior::rvar objects.
gather_rvars(model, ..., ndraws = NULL, seed = NULL)
spread_rvars(model, ..., ndraws = NULL, seed = NULL)
A supported Bayesian model fit. Tidybayes supports a variety of model objects; for a full list of supported models, see tidybayes-models.
Expressions in the form of
variable_name[dimension_1, dimension_2, ...]
. See Details.
The number of draws to return, or NULL
to return all draws.
A seed to use when subsampling draws (i.e. when ndraws
is not NULL
).
A data frame.
Imagine a JAGS or Stan fit named model
. The model may contain a variable named
b[i,v]
(in the JAGS or Stan language) with dimension i
in 1:100
and
dimension v
in 1:3
. However, the default format for draws returned from
JAGS or Stan in R will not reflect this indexing structure, instead
they will have multiple columns with names like "b[1,1]"
, "b[2,1]"
, etc.
spread_rvars
and gather_rvars
provide a straightforward
syntax to translate these columns back into properly-indexed rvar
s in two different
tidy data frame formats, optionally recovering dimension types (e.g. factor levels) as it does so.
spread_rvars
will spread names of variables in the model across the data frame as column names,
whereas gather_rvars
will gather variable names into a single column named ".variable"
and place
values of variables into a column named ".value"
. To use naming schemes from other packages
(such as broom
), consider passing
results through functions like to_broom_names()
or to_ggmcmc_names()
.
For example, spread_rvars(model, a[i], b[i,v])
might return a data frame with:
column "i"
: value in 1:5
column "v"
: value in 1:10
column "a"
: rvar
containing draws from "a[i]"
column "b"
: rvar
containing draws from "b[i,v]"
gather_rvars(model, a[i], b[i,v])
on the same model would return a data frame with:
column "i"
: value in 1:5
column "v"
: value in 1:10
, or NA
on rows where ".variable"
is "a"
.
column ".variable"
: value in c("a", "b")
.
column ".value"
: rvar
containing draws from "a[i]"
(when ".variable"
is "a"
)
or "b[i,v]"
(when ".variable"
is "b"
)
spread_rvars
and gather_rvars
can use type information
applied to the model
object by recover_types()
to convert columns
back into their original types. This is particularly helpful if some of the dimensions in
your model were originally factors. For example, if the v
dimension
in the original data frame data
was a factor with levels c("a","b","c")
,
then we could use recover_types
before spread_rvars
:
model %>%
recover_types(data)
spread_rvars(model, b[i,v])
Which would return the same data frame as above, except the "v"
column
would be a value in c("a","b","c")
instead of 1:3
.
For variables that do not share the same subscripts (or share
some but not all subscripts), we can supply their specifications separately.
For example, if we have a variable d[i]
with the same i
subscript
as b[i,v]
, and a variable x
with no subscripts, we could do this:
spread_rvars(model, x, d[i], b[i,v])
Which is roughly equivalent to this:
spread_rvars(model, x) %>%
inner_join(spread_rvars(model, d[i])) %>%
inner_join(spread_rvars(model, b[i,v]))
Similarly, this:
gather_rvars(model, x, d[i], b[i,v])
Is roughly equivalent to this:
bind_rows(
gather_rvars(model, x),
gather_rvars(model, d[i]),
gather_rvars(model, b[i,v])
)
The c
and cbind
functions can be used to combine multiple variable names that have
the same dimensions. For example, if we have several variables with the same
subscripts i
and v
, we could do either of these:
spread_rvars(model, c(w, x, y, z)[i,v])
spread_rvars(model, cbind(w, x, y, z)[i,v]) # equivalent
Each of which is roughly equivalent to this:
spread_rvars(model, w[i,v], x[i,v], y[i,v], z[i,v])
Besides being more compact, the c()
-style syntax is currently also slightly
faster (though that may change).
Dimensions can be left nested in the resulting rvar
objects by leaving their names
blank; e.g. spread_rvars(model, b[i,])
will place the first index (i
) into
rows of the data frame but leave the second index nested in the b
column
(see Examples below).
spread_draws()
, recover_types()
, compose_data()
. See also
posterior::rvar()
and posterior::as_draws_rvars()
, the functions that power
spread_rvars
and gather_rvars
.
library(dplyr)
data(RankCorr, package = "ggdist")
RankCorr %>%
spread_rvars(b[i, j])
#> # A tibble: 12 × 3
#> i j b
#> <int> <int> <rvar[1d]>
#> 1 1 1 -1.076 ± 0.095
#> 2 2 1 -0.735 ± 0.146
#> 3 3 1 -0.341 ± 0.144
#> 4 1 2 -1.821 ± 0.140
#> 5 2 2 0.101 ± 0.254
#> 6 3 2 -0.733 ± 0.229
#> 7 1 3 0.176 ± 0.071
#> 8 2 3 -0.178 ± 0.120
#> 9 3 3 0.069 ± 0.121
#> 10 1 4 -0.101 ± 0.121
#> 11 2 4 0.268 ± 0.221
#> 12 3 4 0.084 ± 0.207
# leaving an index out nests the index in the column containing the rvar
RankCorr %>%
spread_rvars(b[i, ])
#> # A tibble: 3 × 2
#> i b[,1] [,2] [,3] [,4]
#> <int> <rvar[,1]> <rvar[,1]> <rvar[,1]> <rvar[,1]>
#> 1 1 -1.08 ± 0.095 -1.82 ± 0.14 0.176 ± 0.071 -0.101 ± 0.12
#> 2 2 -0.74 ± 0.146 0.10 ± 0.25 -0.178 ± 0.120 0.268 ± 0.22
#> 3 3 -0.34 ± 0.144 -0.73 ± 0.23 0.069 ± 0.121 0.084 ± 0.21
RankCorr %>%
spread_rvars(b[i, j], tau[i], u_tau[i])
#> # A tibble: 12 × 5
#> i j b tau u_tau
#> <int> <int> <rvar[1d]> <rvar[1d]> <rvar[1d]>
#> 1 1 1 -1.076 ± 0.095 6.0 ± 0.53 5.7 ± 1.0
#> 2 2 1 -0.735 ± 0.146 3.3 ± 0.53 5.7 ± 1.5
#> 3 3 1 -0.341 ± 0.144 3.7 ± 0.52 5.1 ± 1.3
#> 4 1 2 -1.821 ± 0.140 6.0 ± 0.53 5.7 ± 1.0
#> 5 2 2 0.101 ± 0.254 3.3 ± 0.53 5.7 ± 1.5
#> 6 3 2 -0.733 ± 0.229 3.7 ± 0.52 5.1 ± 1.3
#> 7 1 3 0.176 ± 0.071 6.0 ± 0.53 5.7 ± 1.0
#> 8 2 3 -0.178 ± 0.120 3.3 ± 0.53 5.7 ± 1.5
#> 9 3 3 0.069 ± 0.121 3.7 ± 0.52 5.1 ± 1.3
#> 10 1 4 -0.101 ± 0.121 6.0 ± 0.53 5.7 ± 1.0
#> 11 2 4 0.268 ± 0.221 3.3 ± 0.53 5.7 ± 1.5
#> 12 3 4 0.084 ± 0.207 3.7 ± 0.52 5.1 ± 1.3
# gather_rvars places variables and values in a longer format data frame
RankCorr %>%
gather_rvars(b[i, j], tau[i], typical_r)
#> # A tibble: 16 × 4
#> i j .variable .value
#> <int> <int> <chr> <rvar[1d]>
#> 1 1 1 b -1.076 ± 0.095
#> 2 2 1 b -0.735 ± 0.146
#> 3 3 1 b -0.341 ± 0.144
#> 4 1 2 b -1.821 ± 0.140
#> 5 2 2 b 0.101 ± 0.254
#> 6 3 2 b -0.733 ± 0.229
#> 7 1 3 b 0.176 ± 0.071
#> 8 2 3 b -0.178 ± 0.120
#> 9 3 3 b 0.069 ± 0.121
#> 10 1 4 b -0.101 ± 0.121
#> 11 2 4 b 0.268 ± 0.221
#> 12 3 4 b 0.084 ± 0.207
#> 13 1 NA tau 6.041 ± 0.530
#> 14 2 NA tau 3.348 ± 0.534
#> 15 3 NA tau 3.663 ± 0.522
#> 16 NA NA typical_r 0.544 ± 0.144