Geoms and stats for creating dotplots that automatically determines a bin width that ensures the plot fits within the available space. Also ensures dots do not overlap, and allows generation of quantile dotplots using the quantiles argument to stat_dotsinterval/stat_dots and stat_dist_dotsinterval/stat_dist_dots. Generally follows the naming scheme and arguments of the geom_slabinterval() and stat_slabinterval() family of geoms and stats.

geom_dotsinterval(
  mapping = NULL,
  data = NULL,
  stat = "identity",
  position = "identity",
  ...,
  dotsize = 1,
  stackratio = 1,
  binwidth = NA,
  layout = c("bin", "weave", "swarm"),
  na.rm = FALSE,
  show.legend = NA,
  inherit.aes = TRUE
)

geom_dots(
  mapping = NULL,
  data = NULL,
  stat = "identity",
  position = "identity",
  ...,
  na.rm = FALSE,
  show.legend = NA,
  inherit.aes = TRUE
)

stat_dotsinterval(
  mapping = NULL,
  data = NULL,
  geom = "dotsinterval",
  position = "identity",
  ...,
  quantiles = NA,
  point_interval = median_qi,
  na.rm = FALSE,
  show.legend = c(size = FALSE),
  inherit.aes = TRUE
)

stat_dots(
  mapping = NULL,
  data = NULL,
  geom = "dots",
  position = "identity",
  ...,
  show.legend = NA,
  inherit.aes = TRUE
)

stat_dist_dotsinterval(
  mapping = NULL,
  data = NULL,
  geom = "dotsinterval",
  position = "identity",
  ...,
  quantiles = 100,
  na.rm = FALSE,
  show.legend = c(size = FALSE),
  inherit.aes = TRUE
)

stat_dist_dots(
  mapping = NULL,
  data = NULL,
  geom = "dots",
  position = "identity",
  ...,
  show.legend = NA,
  inherit.aes = TRUE
)

Arguments

mapping

Set of aesthetic mappings created by aes() or aes_(). If specified and inherit.aes = TRUE (the default), it is combined with the default mapping at the top level of the plot. You must supply mapping if there is no plot mapping.

data

The data to be displayed in this layer. There are three options:

If NULL, the default, the data is inherited from the plot data as specified in the call to ggplot().

A data.frame, or other object, will override the plot data. All objects will be fortified to produce a data frame. See fortify() for which variables will be created.

A function will be called with a single argument, the plot data. The return value must be a data.frame, and will be used as the layer data. A function can be created from a formula (e.g. ~ head(.x, 10)).

stat

The statistical transformation to use on the data for this layer, as a string.

position

Position adjustment, either as a string, or the result of a call to a position adjustment function.

...

Arguments passed on to geom_slabinterval

orientation

Whether this geom is drawn horizontally ("horizontal") or vertically ("vertical"). The default, NA, automatically detects the orientation based on how the aesthetics are assigned, and should generally do an okay job at this. When horizontal (resp. vertical), the geom uses the y (resp. x) aesthetic to identify different groups, then for each group uses the x (resp. y) aesthetic and the thickness aesthetic to draw a function as an slab, and draws points and intervals horizontally (resp. vertically) using the xmin, x, and xmax (resp. ymin, y, and ymax) aesthetics. For compatibility with the base ggplot naming scheme for orientation, "x" can be used as an alias for "vertical" and "y" as an alias for "horizontal" (tidybayes had an orientation parameter before ggplot did, and I think the tidybayes naming scheme is more intuitive: "x" and "y" are not orientations and their mapping to orientations is, in my opinion, backwards; but the base ggplot naming scheme is allowed for compatibility).

normalize

How to normalize heights of functions input to the thickness aesthetic. If "all" (the default), normalize so that the maximum height across all data is 1; if "panels", normalize within panels so that the maximum height in each panel is 1; if "xy", normalize within the x/y axis opposite the orientation of this geom so that the maximum height at each value of the opposite axis is 1; if "groups", normalize within values of the opposite axis and within groups so that the maximum height in each group is 1; if "none", values are taken as is with no normalization (this should probably only be used with functions whose values are in [0,1], such as CDFs).

fill_type

What type of fill to use when the fill color or alpha varies within a slab. The default, "segments", breaks up the slab geometry into segments for each unique combination of fill color and alpha value. This approach is supported by all graphics devices and works well for sharp cutoff values, but can result in ugly results if a large number of unique fill colors are being used (as in gradients, like in stat_gradientinterval()). When fill_type == "gradient", a linearGradient() is used to create a smooth gradient fill. This works well for large numbers of unique fill colors, but requires R > 4.1 and is not yet supported on all graphics devices.

interval_size_domain

The minimum and maximum of the values of the size aesthetic that will be translated into actual sizes for intervals drawn according to interval_size_range (see the documentation for that argument.)

interval_size_range

(Deprecated). This geom scales the raw size aesthetic values when drawing interval and point sizes, as they tend to be too thick when using the default settings of scale_size_continuous(), which give sizes with a range of c(1, 6). The interval_size_domain value indicates the input domain of raw size values (typically this should be equal to the value of the range argument of the scale_size_continuous() function), and interval_size_range indicates the desired output range of the size values (the min and max of the actual sizes used to draw intervals). Most of the time it is not recommended to change the value of this argument, as it may result in strange scaling of legends; this argument is a holdover from earlier versions that did not have size aesthetics targeting the point and interval separately. If you want to adjust the size of the interval or points separately, you can instead use the interval_size or point_size aesthetics; see scales.

fatten_point

A multiplicative factor used to adjust the size of the point relative to the size of the thickest interval line. If you wish to specify point sizes directly, you can also use the point_size aesthetic and scale_point_size_continuous() or scale_point_size_discrete(); sizes specified with that aesthetic will not be adjusted using fatten_point.

show_slab

Should the slab portion of the geom be drawn? Default TRUE.

show_point

Should the point portion of the geom be drawn? Default TRUE.

show_interval

Should the interval portion of the geom be drawn? Default TRUE.

dotsize

The size of the dots relative to the bin width. The default, 1, makes dots be just about as wide as the bin width.

stackratio

The distance between the center of the dots in the same stack relative to the bin height. The default, 1, makes dots in the same stack just touch each other.

binwidth

The bin width to use for drawing the dotplots. One of:

  • NA (the default): Dynamically select the bin width based on the size of the plot when drawn.

  • A length-1 (scalar) numeric or unit object giving the exact bin width.

  • A length-2 (vector) numeric or unit object giving the minimum and maximum desired bin width. The bin width will be dynamically selected within these bounds.

If the value is numeric, it is assumed to be in units of data. The bin width (or its bounds) can also be specified using unit(), which may be useful if it is desired that the dots be a certain point size or a certain percentage of the width/height of the viewport. For example, unit(0.1, "npc") would make dots that are exactly 10% of the viewport size along whichever dimension the dotplot is drawn; unit(c(0, 0.1), "npc") would make dots that are at most 10% of the viewport size.

layout

The layout method used for the dots:

  • "bin" (default): places dots on the off-axis at the midpoint of their bins as in the classic Wilkinson dotplot. This maintains the alignment of rows and columns in the dotplot. This layout is slightly different from the classic Wilkinson algorithm in that: (1) it nudges bins slightly to avoid overlapping bins and (2) if the input data are symmetrical it will return a symmetrical layout.

  • "weave": uses the same basic binning approach of "bin", but places dots in the off-axis at their actual positions (modulo overlaps, which are nudged out of the way). This maintains the alignment of rows but does not align dots within columns. Does not work well when side = "both".

  • "swarm": uses the "compactswarm" layout from beeswarm::beeswarm(). Does not maintain alignment of rows or columns, but can be more compact and neat looking, especially for sample data (as opposed to quantile dotplots of theoretical distributions, which may look better with "bin" or "weave").

na.rm

If FALSE, the default, missing values are removed with a warning. If TRUE, missing values are silently removed.

show.legend

logical. Should this layer be included in the legends? NA, the default, includes if any aesthetics are mapped. FALSE never includes, and TRUE always includes. It can also be a named logical vector to finely select the aesthetics to display.

inherit.aes

If FALSE, overrides the default aesthetics, rather than combining with them. This is most useful for helper functions that define both data and aesthetics and shouldn't inherit behaviour from the default plot specification, e.g. borders().

geom

Use to override the default connection between stat_slabinterval and geom_slabinterval()

quantiles

For the stat_ and stat_dist_ stats, setting this to a value other than NA will produce a quantile dotplot: that is, a dotplot of quantiles from the sample (for stat_) or a dotplot of quantiles from the distribution (for stat_dist_). The value of quantiles determines the number of quantiles to plot. See Kay et al. (2016) and Fernandes et al. (2018) for more information on quantile dotplots.

point_interval

A function from the point_interval() family (e.g., median_qi, mean_qi, etc). This function should take in a vector of value, and should obey the .width and .simple_names parameters of point_interval() functions, such that when given a vector with .simple_names = TRUE should return a data frame with variables .value, .lower, .upper, and .width. Output will be converted to the appropriate x- or y-based aesthetics depending on the value of orientation. See the point_interval() family of functions for more information.

Value

A ggplot2::Geom or ggplot2::Stat representing a dotplot or combined dotplot+interval geometry which can be added to a ggplot() object.

Details

The dots geoms are similar to geom_dotplot() but with a number of differences:

  • Dots geoms act like slabs in geom_slabinterval() and can be given x positions (or y positions when in a horizontal orientation).

  • Given the available space to lay out dots, the dots geoms will automatically determine how many bins to use to fit the available space.

  • Dots geoms use a dynamic layout algorithm that lays out dots from the center out if the input data are symmetrical, guaranteeing that symmetrical data results in a symmetrical plot. The layout algorithm also prevents dots from overlapping each other.

  • The shape of the dots in a in these geoms can be changed using the slab_shape aesthetic (when using the dotsinterval family) or the shape or slab_shape aesthetic (when using the dots family)

The stat_... and stat_dist_... versions of the stats when used with the quantiles argument are particularly useful for constructing quantile dotplots, which can be an effective way to communicate uncertainty using a frequency framing that may be easier for laypeople to understand (Kay et al. 2016, Fernandes et al. 2018).

Aesthetics

The slab+interval stats and geoms have a wide variety of aesthetics that control the appearance of their three sub-geometries: the slab, the point, and the interval.

These stats support the following aesthetics:

  • x: x position of the geometry (when orientation = "vertical"); or sample data to be summarized (when orientation = "horizontal") except for stat_dist_ geometries (which use only one of x or y at a time along with the dist aesthetic).

  • y: y position of the geometry (when orientation = "horizontal"); or sample data to be summarized (when orientation = "vertical") except for stat_dist_ geometries (which use only one of x or y at a time along with the dist aesthetic).

In addition, in their default configuration (paired with geom_dotsinterval()) the following aesthetics are supported by the underlying geom:

Slab-specific aesthetics

  • thickness: The thickness of the slab at each x value (if orientation = "horizontal") or y value (if orientation = "vertical") of the slab.

  • side: Which side to place the slab on. "topright", "top", and "right" are synonyms which cause the slab to be drawn on the top or the right depending on if orientation is "horizontal" or "vertical". "bottomleft", "bottom", and "left" are synonyms which cause the slab to be drawn on the bottom or the left depending on if orientation is "horizontal" or "vertical". "topleft" causes the slab to be drawn on the top or the left, and "bottomright" causes the slab to be drawn on the bottom or the right. "both" draws the slab mirrored on both sides (as in a violin plot).

  • scale: What proportion of the region allocated to this geom to use to draw the slab. If scale = 1, slabs that use the maximum range will just touch each other. Default is 0.9 to leave some space.

  • justification: Justification of the interval relative to the slab, where 0 indicates bottom/left justification and 1 indicates top/right justification (depending on orientation). If justification is NULL (the default), then it is set automatically based on the value of side: when side is "top"/"right" justification is set to 0, when side is "bottom"/"left" justification is set to 1, and when side is "both" justification is set to 0.5.

  • datatype: When using composite geoms directly without a stat (e.g. geom_slabinterval()), datatype is used to indicate which part of the geom a row in the data targets: rows with datatype = "slab" target the slab portion of the geometry and rows with datatype = "interval" target the interval portion of the geometry. This is set automatically when using ggdist stats.

Interval-specific aesthetics

  • xmin: Left end of the interval sub-geometry (if orientation = "horizontal").

  • xmax: Right end of the interval sub-geometry (if orientation = "horizontal").

  • ymin: Lower end of the interval sub-geometry (if orientation = "vertical").

  • ymax: Upper end of the interval sub-geometry (if orientation = "vertical").

Point-specific aesthetics

  • shape: Shape type used to draw the point sub-geometry.

Color aesthetics

  • colour: (or color) The color of the interval and point sub-geometries. Use the slab_color, interval_color, or point_color aesthetics (below) to set sub-geometry colors separately.

  • fill: The fill color of the slab and point sub-geometries. Use the slab_fill or point_fill aesthetics (below) to set sub-geometry colors separately.

  • alpha: The opacity of the slab, interval, and point sub-geometries. Use the slab_alpha, interval_alpha, or point_alpha aesthetics (below) to set sub-geometry colors separately.

  • colour_ramp: (or color_ramp) A secondary scale that modifies the color scale to "ramp" to another color. See scale_colour_ramp() for examples.

  • fill_ramp: (or fill_ramp) A secondary scale that modifies the fill scale to "ramp" to another color. See scale_fill_ramp() for examples.

Line aesthetics

  • size: Width of the outline around the slab (if visible). Also determines the width of the line used to draw the interval and the size of the point, but raw size values are transformed according to the interval_size_domain, interval_size_range, and fatten_point parameters of the geom (see above). Use the slab_size, interval_size, or point_size aesthetics (below) to set sub-geometry line widths separately (note that when size is set directly using the override aesthetics, interval and point sizes are not affected by interval_size_domain, interval_size_range, and fatten_point).

  • stroke: Width of the outline around the point sub-geometry.

  • linetype: Type of line (e.g., "solid", "dashed", etc) used to draw the interval and the outline of the slab (if it is visible). Use the slab_linetype or interval_linetype aesthetics (below) to set sub-geometry line types separately.

Slab-specific color/line override aesthetics

  • slab_fill: Override for fill: the fill color of the slab.

  • slab_colour: (or slab_color) Override for colour/color: the outline color of the slab.

  • slab_alpha: Override for alpha: the opacity of the slab.

  • slab_size: Override for size: the width of the outline of the slab.

  • slab_linetype: Override for linetype: the line type of the outline of the slab.

  • slab_shape: Override for shape: the shape of the dots used to draw the dotplot slab.

Interval-specific color/line override aesthetics

  • interval_colour: (or interval_color) Override for colour/color: the color of the interval.

  • interval_alpha: Override for alpha: the opacity of the interval.

  • interval_size: Override for size: the line width of the interval.

  • interval_linetype: Override for linetype: the line type of the interval.

Point-specific color/line override aesthetics

  • point_fill: Override for fill: the fill color of the point.

  • point_colour: (or point_color) Override for colour/color: the outline color of the point.

  • point_alpha: Override for alpha: the opacity of the point.

  • point_size: Override for size: the size of the point.

Other aesthetics (these work as in standard geoms)

  • width

  • height

  • group

See examples of some of these aesthetics in action in vignette("slabinterval"). Learn more about the sub-geom override aesthetics (like interval_color) in the scales documentation. Learn more about basic ggplot aesthetics in vignette("ggplot2-specs").

References

Kay, M., Kola, T., Hullman, J. R., & Munson, S. A. (2016). When (ish) is My Bus? User-centered Visualizations of Uncertainty in Everyday, Mobile Predictive Systems. Conference on Human Factors in Computing Systems - CHI '16, 5092--5103. doi: 10.1145/2858036.2858558 .

Fernandes, M., Walls, L., Munson, S., Hullman, J., & Kay, M. (2018). Uncertainty Displays Using Quantile Dotplots or CDFs Improve Transit Decision-Making. Conference on Human Factors in Computing Systems - CHI '18. doi: 10.1145/3173574.3173718 .

See also

See stat_sample_slabinterval() and stat_dist_slabinterval() for families of other stats built on top of geom_slabinterval(). See vignette("slabinterval") for a variety of examples of use.

Author

Matthew Kay

Examples

library(dplyr) library(ggplot2) data(RankCorr_u_tau, package = "ggdist") # orientation is detected automatically based on # which axis is discrete RankCorr_u_tau %>% ggplot(aes(x = u_tau)) + geom_dots()
RankCorr_u_tau %>% ggplot(aes(y = u_tau)) + geom_dots()
# stat_dots can summarize quantiles, creating quantile dotplots RankCorr_u_tau %>% ggplot(aes(x = u_tau, y = factor(i))) + stat_dots(quantiles = 100)
# color and fill aesthetics can be mapped within the geom # dotsinterval adds an interval RankCorr_u_tau %>% ggplot(aes(x = u_tau, y = factor(i), fill = stat(x > 6))) + stat_dotsinterval(quantiles = 100)