Dots + interval stats and geoms
Matthew Kay
20240221
Source:vignettes/dotsinterval.Rmd
dotsinterval.Rmd
Introduction
This vignette describes the dots+interval geoms and stats in
ggdist
. This is a flexible subfamily of stats and geoms
designed to make plotting dotplots straightforward. In particular, it
supports a selection of useful layouts (including the classic Wilkinson
layout, a weave layout, and a beeswarm layout) and can automatically
select the dot size so that the dotplot stays within the bounds of the
plot.
Anatomy of geom_dotsinterval()
The dotsinterval
family of geoms and stats is a
subfamily of slabinterval (see vignette("slabinterval")
),
where the “slab” is a collection of dots forming a dotplot and the
interval is a summary point (e.g., mean, median, mode) with an arbitrary
number of intervals.
The base geom_dotsinterval()
uses a variety of custom
aesthetics to create the composite geometry:
Depending on whether you want a horizontal or vertical orientation,
you can provide ymin
and ymax
instead of
xmin
and xmax
. By default, some aesthetics
(e.g., fill
, color
, size
,
alpha
) set properties of multiple subgeometries at once.
For example, the color
aesthetic by default sets both the
color of the point and the interval, but can also be overridden by
point_color
or interval_color
to set the color
of each subgeometry separately.
Due to its relationship to the geom_slabinterval()
family, aesthetics specific to the “dots” subgeometry are referred to
with the prefix slab_
. When using the standalone
geom_dots()
geometry, it is not necessary to use these
custom aesthetics:
geom_dotsinterval()
is often most useful when paired
with stat_dotsinterval()
, which will automatically
calculate points and intervals and map these onto endpoints of the
interval subgeometry.
stat_dotsinterval()
and stat_dots()
can be
used on two types of data, depending on what aesthetic mappings you
provide:
Sample data; e.g. draws from a data distribution, bootstrap distribution, Bayesian posterior distribution (or any other distribution, really). To use the stats on sample data, map sample values onto the
x
ory
aesthetic.Distribution objects and analytical distributions. To use the stats on this type of data, you must use the
xdist
, orydist
aesthetics, which take distributional objects,posterior::rvar()
objects, or distribution names (e.g."norm"
, which refers to the Normal distribution provided by thednorm/pnorm/qnorm
functions). When used on analytical distributions (e.g.distributional::dist_normal()
), thequantiles
argument determines the number of quantiles used (and therefore the number of dots shown); the default is100
.
All dotsinterval
geoms can be plotted horizontally or
vertically. Depending on how aesthetics are mapped, they will attempt to
automatically determine the orientation; if this does not produce the
correct result, the orientation can be overridden by setting
orientation = "horizontal"
or
orientation = "vertical"
.
Controlling dot layout
Size and layout of dots in the dotplot are controlled by four
parameters: scale
, binwidth
,
dotsize
, and stackratio
.
scale
: Ifbinwidth
is not set (isNA
), then thebinwidth
is determined automatically so that the height of the highest stack of dots is less thanscale
. The default value ofscale
, 0.9, ensures there is a small gap between dotplots when multiple dotplots are drawn.
binwidth
: The width of the bins used to lay out the dots:
NA
(default): Usescale
to determine bin width.  A single numeric or
unit()
: the exact bin width to use. If it isnumeric
, the bin width is expressed in data units; useunit()
to specify the width in terms of screen coordinates (e.g.unit(0.1, "npc")
would make the bin width 0.1 normalized parent coordinates, which would be 10% of the plot width.)  A 2vector of numerics or
unit()
s giving an acceptable minimum and maximum width. The automatic bin width algorithm will attempt to find the largest bin width between these two values that also keeps the tallest stack of dots shorter thanscale
.

dotsize
: The size of the dots as a percentage ofbinwidth
. The default value is1.07
rather than1
. This value was chosen largely by trial and error, to find a value that gives nicelooking layouts with circular dots on continuous distributions, accounting for the fact that a slight overlap of dots tends to give a nicer apparent visual distance between adjacent stacks than the precise value of1
.stackratio
: The distance between the centers of dots in a stack as a proportion of the height of each dot.stackratio = 1
, the default, mean dots will just touch;stackratio < 1
means dots will overlap each other, andstackratio > 1
means dots will have gaps between them.
Side
The side
aesthetic allows you to adjust the positioning
and direction of the dots:

"top"
,"right"
, or"topright"
: draw the dots on the top or on the right, depending onorientation

"bottom"
,"left"
, or"bottomleft"
: draw the dots on the bottom or on the left, depending onorientation

"topleft"
: draw the dots on top or on the left, depending onorientation

"bottomright"
: draw the dots on the bottom or on the right, depending onorientation

"both"
: draw the dots mirrored, as in a “beeswarm” plot.
When orientation = "horizontal"
, this yields:
set.seed(1234)
x = rnorm(100)
side_plot = function(...) {
expand.grid(
x = x,
side = c("topright", "both", "bottomleft"),
stringsAsFactors = FALSE
) %>%
ggplot(aes(side = side, ...)) +
geom_dots() +
facet_grid(~ side, labeller = "label_both") +
labs(x = NULL, y = NULL) +
theme(panel.border = element_rect(color = "gray75", fill = NA))
}
side_plot(x = x) +
labs(title = "Horizontal geom_dots() with different values of side") +
scale_y_continuous(breaks = NULL)
When orientation = "vertical"
, this yields:
side_plot(y = x) +
labs(title = "Vertical geom_dots() with different values of side") +
scale_x_continuous(breaks = NULL)
Layout
The layout
parameter allows you to adjust the algorithm
used to place dots:

"bin"
(default): places dots on the offaxis at the midpoint of their bins as in the classic Wilkinson dotplot. This maintains the alignment of rows and columns in the dotplot. This layout is slightly different from the classic Wilkinson algorithm in that: (1) it nudges bins slightly to avoid overlapping bins and (2) if the input data are symmetrical it will return a symmetrical layout. 
"weave"
: uses the same basic binning approach of “bin”, but places dots in the offaxis at their actual positions (modulo overlaps, which are nudged out of the way). This maintains the alignment of rows but does not align dots within columns. 
"hex"
: uses the same basic binning approach of “bin”, but alternates placing dots+binwidth/4
orbinwidth/4
in the offaxis from the bin center. This allows hexagonal packing by setting astackratio
less than1
(something like0.9
tends to work).  “swarm”: uses the
"compactswarm"
layout frombeeswarm::beeswarm()
. Does not maintain alignment of rows or columns, but can be more compact and neat looking, especially for sample data (as opposed to quantile dotplots of theoretical distributions, which may look better with"bin"
,"weave"
, or"hex"
).
When side
is "top"
, these layouts look like
this:
layout_plot = function(layout, side, ...) {
data.frame(
x = x
) %>%
ggplot(aes(x = x)) +
geom_dots(layout = layout, side = side, stackratio = if (layout == "hex") 0.9 else 1) +
labs(
subtitle = paste0("layout = ", deparse(layout), if (layout == "hex") " with stackratio = 0.9"),
x = NULL,
y = NULL
) +
scale_y_continuous(breaks = NULL) +
theme(panel.border = element_rect(color = "gray75", fill = NA))
}
(layout_plot("bin", side = "top") + layout_plot("hex", side = "top")) /
(layout_plot("weave", side = "top") + layout_plot("swarm", side = "top")) +
plot_annotation(title = 'geom_dots() layouts with side = "top"')
When side
is "both"
, these layouts look
like this:
(layout_plot("bin", side = "both") + layout_plot("hex", side = "both")) /
(layout_plot("weave", side = "both") + layout_plot("swarm", side = "both")) +
plot_annotation(title = 'geom_dots() layouts with side = "both"')
Beeswarm plots
Thus, it is possible to create beeswarm plots by using
geom_dots()
with side = "both"
:
set.seed(1234)
abc_df = tibble(
value = rnorm(300, mean = c(1,2,3), sd = c(1,2,2)),
abc = rep(c("a", "b", "c"), 100)
)
abc_df %>%
ggplot(aes(x = abc, y = value)) +
geom_dots(side = "both") +
ggtitle('geom_dots(side = "both")')
side = "both"
also tends to work well with the
"hex"
and "swarm"
layouts for more
classiclooking “beeswarm” plots:
abc_df %>%
ggplot(aes(x = abc, y = value)) +
geom_dots(side = "both", layout = "hex", stackratio = 0.92) +
ggtitle('geom_dots(side = "both", layout = "hex")')
The combination of binwidth = unit(1.5, "mm")
and
overflow = "compress"
(see the section on large samples,
below) can be used to set the dot size to a specific size while
guaranteeing the layout stays within the bounds of the geom. This
combination is used by two shortcut geoms, geom_swarm()
and
geom_weave()
, which use the "swarm"
and
"weave"
layouts respectively. These also use
side = "both"
, and are intended to make it easy to create
goodlooking beeswarm plots without manually tweaking
settings:
set.seed(1234)
swarm_data = tibble(
y = rnorm(300, c(1,4)),
g = rep(c("a","b"), 150)
)
swarm_plot = swarm_data %>%
ggplot(aes(x = g, y = y)) +
geom_swarm(linewidth = 0, alpha = 0.75) +
labs(title = "geom_swarm()")
weave_plot = swarm_data %>%
ggplot(aes(x = g, y = y)) +
geom_weave(linewidth = 0, alpha = 0.75) +
labs(title = "geom_weave()")
swarm_plot + weave_plot
Varying color
, fill
, shape
,
and linewidth
Aesthetics like color
, fill
,
shape
, and linewidth
can be varied over the
dots. For example, we can vary the fill
aesthetic to create
two subgroups, and use position = "dodge"
to dodge entire
“swarms” at once so the subgroups do not overlap. We’ll also set
linewidth = 0
so that the default gray outline is not
drawn:
set.seed(12345)
abcc_df = tibble(
value = rnorm(300, mean = c(1,2,3,4), sd = c(1,2,2,1)),
abc = rep(c("a", "b", "c", "c"), 75),
hi = rep(c("h", "h", "h", "i"), 75)
)
abcc_df %>%
ggplot(aes(y = value, x = abc, fill = hi)) +
geom_weave(position = "dodge", linewidth = 0, alpha = 0.75) +
scale_fill_brewer(palette = "Dark2") +
ggtitle(
'geom_weave(position = "dodge")',
'aes(fill = hi, shape = hi)'
)
Varying discrete aesthetics within dot groups
By default, if you assign a discrete variable to fill
,
color,
shape, etc it will also be used in the
group`
aesthetic to determine dot groups, which are laid out separate (and can
be dodged separately, as above).
If you override this behavior by setting group
to
NA
(or to some other variable you want to group dot layouts
by), geom_dotsinterval()
will leave dots in data order
within the layout but allow aesthetics to vary across them.
For example:
abcc_df %>%
ggplot(aes(y = value, x = abc, fill = hi, group = NA)) +
geom_dots(linewidth = 0) +
scale_color_brewer(palette = "Dark2") +
ggtitle(
'geom_dots()',
'aes(fill = hi, group = NA)'
)
By default, dot positions within bins for the "bin"
layout are determined by their data values (e.g. by the y
values in the above chart). You can override this by passing a variable
to the order
aesthetic, which will set the sort order
within bins. This can be used to create “stacked” dotplots by setting
order
to a discrete variable:
On large samples
Setting a minimum dot size
On very large samples, the dots may become smaller than desired. To
avoid this, you can set a desired dot size / bin width using the
binwidth
argument. To set a specific bin width, pass a
1element vector; to set a minimum bin width, pass a 2element vector,
where the first element is the min and the second the max. The bin width
can be in data units (if numeric
) or in plotting units
(using grid::unit()
).
For example, we could set the minimum dot size to
unit(1.5, "mm")
, which is the default size of points in
ggplot2::geom_point()
. We’ll also set
overflow = "compress"
, which allows dots to overlap if
necessary to maintain the specified dot size (rather than having the
tallest stacks of dots leave the top of the screen):
“density” dotplots
The dotplot above on a sample of size 2000 is quite noisy. When applied to large samples where you do not care too much about individual dot positions, you may want to apply some smoothing to make the layout more appealing.
geom_dots()
supports a handful of smoothers
which can be applied using the smooth =
parameter. These
all correspond to functions that start with smooth_
, like
smooth_bounded()
, smooth_unbounded()
, and
smooth_discrete()
, and can be applied either by passing the
suffix as a string (e.g. smooth = "bounded"
) or by passing
the function itself, to set specific options on it (e.g.
smooth = smooth_bonuded(adjust = 0.5)
). For continuous
distributions with unbounded support, smooth_unbounded()
is
a good choice; it applies a kernel density estimator the assumes
infinite bounds (see density_unbounded()
):
ggplot() +
geom_dots(aes(x), smooth = "unbounded") +
labs(
title = 'geom_dots() with 2000 dots',
subtitle = 'smooth = "unbounded"',
x = NULL
) +
scale_y_continuous(breaks = NULL)
Note that dot positions in the resulting plot will no longer be as accurate as before. With a large sample this may be an acceptable compromise. With a small sample, I do not recommend using this technique.
On bounded distributions, you should use
smooth_bounded()
, providing the bounds of the distribution.
Otherwise, the dotplot will be smoothed incorrectly. For example, on a
Beta(0.5, 0.5) distribution, which is bounded between 0 and 1, we should
use smooth = smooth_bounded(bounds = c(0, 1))
:
set.seed(1234)
x = rbeta(2000, 0.5, 0.5)
ggplot(data.frame(x), aes(x)) +
geom_dots(aes(y = "bounded"), smooth = smooth_bounded(bounds = c(0, 1))) +
geom_dots(aes(y = "unbounded"), smooth = "unbounded") +
geom_vline(xintercept = c(0, 1), alpha = 0.25) +
scale_x_continuous(breaks = c(0, 0.5, 1)) +
labs(
title = "geom_dots(smooth = ...) on x ~ Beta(0.5, 0.5)",
y = "smooth ="
)
Notice how smooth = "unbounded"
incorrectly smooths data
points to be outside the range of the data when the data are
bounded.
On discrete distributions
The dots family includes a variety of features to make visualizing discrete and categorical distributions easier. Dotplot smoothing can be particularly useful in for these distributions, particularly when bin counts are very high. For example, these distributions are hard to visualize under the default settings, because the dots become very small:
set.seed(1234)
abcd_df = tibble(
x = sample(c("a", "b", "c", "d"), 1000, replace = TRUE, prob = c(0.27, 0.6, 0.03, 0.005)),
g = rep(c("a","b"), 500)
)
abcd_df %>%
ggplot(aes(x = x)) +
geom_dots() +
scale_y_continuous(breaks = NULL) +
labs(
title = "geom_dots()",
subtitle = "on a large discrete sample"
)
The automatic bin width algorithm selects a dot size that is very small in order to ensure the tallest bin fits in the plot, but this means the dots are hard to see.
Using the smooth_discrete()
smoother, we can spread the
dots in each bin out into rectangular shapes:
abcd_df %>%
ggplot(aes(x = x)) +
geom_dots(smooth = "discrete") +
scale_y_continuous(breaks = NULL) +
labs(
title = 'geom_dots(smooth = "discrete")',
subtitle = "on a large discrete sample"
)
More regular barlike shapes can be achieved by using
layout = "bar"
, so long as you override the default
ggplot2
behavior of grouping data by all discrete
variables. This allows the layout to be calculated taking all groups
into account:
abcd_df %>%
ggplot(aes(x = x, fill = g, order = g)) +
geom_dots(layout = "bar", group = NA, color = NA) +
scale_y_continuous(breaks = NULL) +
labs(
title = 'geom_dots(aes(fill = g), layout = "bar", group = NA)',
subtitle = "on a large discrete sample"
)
smooth_discrete()
applies a kernel density smoother
whose default bandwidth is less than the distances between bins. We can
use the kernel
argument (passed to
density_bounded()
; the same kernels from
stats::density()
are available) to change the shape of the
bins.
For example, using the "epanechnikov"
(parabolic) kernel
along with side = "both"
, we can create lozengelike
shapes. We’ll abbreviate the kernel "ep"
to save typing out
"epanechnikov"
(partial matching is allowed):
abcd_df %>%
ggplot(aes(x = x)) +
geom_dots(smooth = smooth_discrete(kernel = "ep"), side = "both") +
scale_y_continuous(breaks = NULL) +
labs(
title = 'geom_dots(smooth = smooth_discrete(kernel = "ep"), side = "both")',
subtitle = "on a large discrete sample"
)
On analytical distributions
Like the stat_slabinterval()
family,
stat_dotsinterval()
and stat_dots()
support
using both sample data (via x
and y
aesthetics) or analytical distributions (via the xdist
and
ydist
aesthetics). For analytical distributions, these
stats accept specifications for distributions in one of two ways:

Using distribution names as character vectors: this format uses aesthetics as follows:

xdist
,ydist
, ordist
: the name of the distribution, following R’s naming scheme. This is a string which should have"p"
,"q"
, and"d"
functions defined for it: e.g., “norm” is a valid distribution name because thepnorm()
,qnorm()
, anddnorm()
functions define the CDF, quantile function, and density function of the Normal distribution. 
args
orarg1
, …arg9
: arguments for the distribution. If you useargs
, it should be a list column where each element is a list containing arguments for the distribution functions; alternatively, you can pass the arguments directly usingarg1
, …arg9
.


Using distribution vectors from the distributional package or
posterior::rvar()
objects: this format uses aesthetics as follows:
xdist
,ydist
, ordist
: a distribution vector orposterior::rvar()
produced by functions such asdistributional::dist_normal()
,distributional::dist_beta()
,posterior::rvar_rng()
, etc.

For example, here are a variety of distributions:
dist_df = tibble(
dist = c(dist_normal(1,0.25), dist_beta(3,3), dist_gamma(5,5)),
dist_name = format(dist)
)
dist_df %>%
ggplot(aes(y = dist_name, xdist = dist)) +
stat_dotsinterval(subguide = 'integer') +
ggtitle(
"stat_dotsinterval(subguide = 'integer')",
"aes(y = dist_name, xdist = dist)"
)
This example also shows the use of subguides to label dot counts.
See the documentation of subguide_axis()
and its shortcuts
(particularly subguide_integer()
and
subguide_count()
) for more examples. Note:
subguides are currently only available in the development version of
ggdist (> 3.3.1) on Github.
Analytical distributions are shown by default using 100 quantiles,
sometimes referred to as a quantile dotplot, which can help
people make better decisions under uncertainty (Kay 2016, Fernandes 2018). This
can be changed using the quantiles
argument. For example,
we can plot the same distributions again using 1000 quantiles. We’ll
also make use of point_interval
to plot the mode and
highestdensity continuous intervals (instead of the default median and
quantile intervals; see point_interval()
).
We’ll also highlight some intervals by coloring the dots. Like with
the stat_slabinterval()
family, computed variables from the
interval subgeometry (level
and .width
) are
available to the dots/slab subgeometry, and correspond to the smallest
interval containing that dot. We can use these to color dots according
to the interval containing them (we’ll also use the "weave"
layout since it maintains x positions better than the "bin"
layout):
dist_df %>%
ggplot(aes(y = dist_name, xdist = dist, slab_fill = after_stat(level))) +
stat_dotsinterval(quantiles = 1000, point_interval = mode_hdci, layout = "weave", slab_color = NA) +
scale_color_manual(values = scales::brewer_pal()(3)[1], aesthetics = "slab_fill") +
ggtitle(
"stat_dotsinterval(quantiles = 1000, point_interval = mode_hdci)",
"aes(y = dist_name, xdist = dist, slab_fill = after_stat(level))"
)
When summarizing sample distributions with
stat_dots()
/stat_dotsinterval()
(e.g. samples
from Bayesian posteriors), one can also use the quantiles
argument, though it is not on by default.
Varying continuous aesthetics with analytical distributions
While varying discrete aesthetics works similarly with
stat_dotsinterval()
/stat_dots()
as it does
with geom_dotsinterval()
/geom_dots()
, varying
continuous aesthetics within dot groups typically requires mapping the
continuous aesthetic after the stats are computed. This is
because the stat (at least for analytical distributions) must first
generate the quantiles before properties of those quantiles can be
mapped to aesthetics.
Thus, because it relies upon generated variables from the stat, you
can use the after_stat()
or stage()
functions
from ggplot2
to map those variables. For example:
dist_df %>%
ggplot(aes(y = dist_name, xdist = dist, slab_color = after_stat(x))) +
stat_dotsinterval(slab_shape = 19, quantiles = 500) +
scale_color_distiller(aesthetics = "slab_color", guide = "colorbar2") +
ggtitle(
"stat_dotsinterval(slab_shape = 19, quantiles = 500)",
'aes(slab_color = after_stat(x)) +\nscale_color_distiller(aesthetics = "slab_color", guide = "colorbar2")'
)
This example also demonstrates the use of subgeometry scales: the
slab_
prefixed aesthetics slab_color
and
slab_shape
must be used to target the color and shape of
the slab (“slab” here refers to the stack of dots) when using
geom_dotsinterval()
and stat_dotsinterval()
to
disambiguate between the point/interval and the dot stack. When using
stat_dots()
/geom_dots()
this is not
necessary.
Also note the use of scale_color_distiller()
, a base
ggplot2 color scale, with the slab_color
aesthetic by
setting the aesthetics
and guide
properties
(the latter is necessary because the default
guide = "colorbar"
will not work with nonstandard color
aesthetics).
Thresholds
Another potentially useful application of poststat aesthetic
computation is to apply thresholds on a dotplot, coloring points on one
side of a line differently. However, the default dotplot layout,
"bin"
, can cause dots to be on the wrong side of a cutoff
when coloring dots within dotplots. Thus it can be useful when plotting
thresholds to use the "weave"
or "swarm"
layouts, which tend to position dots closer to their true x
positions, rather than at bin centers:
ab_df = tibble(
ab = c("a", "b"),
mean = c(5, 7),
sd = c(1, 1.5)
)
ab_df %>%
ggplot(aes(y = ab, xdist = dist_normal(mean, sd), fill = after_stat(x < 6))) +
stat_dots(position = "dodge", color = NA, layout = "weave") +
labs(
title = 'stat_dots(layout = "weave")',
subtitle = "aes(fill = after_stat(x < 6))"
) +
geom_vline(xintercept = 6, alpha = 0.25) +
scale_x_continuous(breaks = 2:10)
Rain cloud plots
Sometimes you may want to include multiple different types of slabs in the same plot in order to take advantage of the features each slab type provides. For example, people often combine densities with dotplots to show the underlying datapoints that go into a density estimate, creating socalled rain cloud plots.
To use multiple slab geometries together, you can use the
side
parameter to change which side of the interval a slab
is drawn on and set the scale
parameter to something around
0.5
(by default it is 0.9
) so that the two
slabs do not overlap. We’ll also scale the halfeye slab thickness by
n
(the number of observations in each group) so that the
area of each slab represents sample size (and looks similar to the total
area of its corresponding dotplot).
We’ll use a subsample of of the data to show how it might look on a reasonablysized dataset.
set.seed(12345) # for reproducibility
tibble(
abc = rep(c("a", "b", "b", "c"), 50),
value = rnorm(200, c(1, 8, 8, 3), c(1, 1.5, 1.5, 1))
) %>%
ggplot(aes(y = abc, x = value, fill = abc)) +
stat_slab(aes(thickness = after_stat(pdf*n)), scale = 0.7) +
stat_dotsinterval(side = "bottom", scale = 0.7, slab_linewidth = NA) +
scale_fill_brewer(palette = "Set2") +
ggtitle(
paste0(
'stat_slab(aes(thickness = after_stat(pdf*n)), scale = 0.7) +\n',
'stat_dotsinterval(side = "bottom", scale = 0.7, slab_linewidth = NA)'
),
'aes(fill = abc)'
)
Logit dotplots
To demonstrate another useful plot type, the logit dotplot (courtesy Ladislas Nalborczyk), we’ll fit a logistic regression to some data on the sex and body mass of Gentoo penguins.
First, we’ll demo varying the side
aesthetic to create
two dotplots that are “facing” each other:
scale_side_mirrored()
will set the side
aesthetic to "top"
or "bottom"
if two
categories are assigned to side
“. We also adjust the
scale
so that the dots don’t overlap:
gentoo = penguins %>%
filter(species == "Gentoo", !is.na(sex))
gentoo %>%
ggplot(aes(x = body_mass_g, y = sex, side = sex)) +
geom_dots(scale = 0.5) +
scale_side_mirrored(guide = "none") +
ggtitle(
"geom_dots(scale = 0.5)",
'aes(side = sex) + scale_side_mirrored()'
)
This can also be accomplished by setting side directly and omitting
scale_side_mirrored()
; e.g. via
aes(side = ifelse(sex == "male", "bottom", "top"))
.
Now we fit a logistic regression predicting sex based on body mass:
m = glm(sex == "male" ~ body_mass_g, data = gentoo, family = binomial)
m
##
## Call: glm(formula = sex == "male" ~ body_mass_g, family = binomial,
## data = gentoo)
##
## Coefficients:
## (Intercept) body_mass_g
## 55.03337 0.01089
##
## Degrees of Freedom: 118 Total (i.e. Null); 117 Residual
## Null Deviance: 164.9
## Residual Deviance: 45.1 AIC: 49.1
Then we can overlay a fit line as a stat_lineribbon()
(see vignette("lineribbon")
) on top of the mirrored
dotplots to create a logit dotplot:
# construct a prediction grid for the fit line
prediction_grid = with(gentoo,
data.frame(body_mass_g = seq(min(body_mass_g), max(body_mass_g), length.out = 100))
)
prediction_grid %>%
bind_cols(predict(m, ., se.fit = TRUE)) %>%
mutate(
# distribution describing uncertainty in log odds
log_odds = dist_normal(fit, se.fit),
# inverselogit transform the log odds to get
# distribution describing uncertainty in Pr(sex == "male")
p_male = dist_transformed(log_odds, plogis, qlogis)
) %>%
ggplot(aes(x = body_mass_g)) +
geom_dots(
aes(y = as.numeric(sex == "male"), side = sex),
scale = 0.4,
data = gentoo
) +
stat_lineribbon(
aes(ydist = p_male), alpha = 1/4, fill = "#08306b"
) +
scale_side_mirrored(guide = "none") +
coord_cartesian(ylim = c(0, 1)) +
labs(
title = "logit dotplot: stat_dots() with stat_lineribbon()",
subtitle = 'aes(side = sex) + scale_side_mirrored()',
x = "Body mass (g) of Gentoo penguins",
y = "Pr(sex = male)"
)