Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 86 additions & 0 deletions R/ppc-distributions.R
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,44 @@
#' the predictive distributions.
#' @param ... For dot plots, optional additional arguments to pass to [ggdist::stat_dots()].
#'
#' @param binwidth Numeric scalar; passed to the underlying geom/stat used
#' by the plotting function. Behavior by function:
#' * `ppc_hist()`: forwarded to `ggplot2::geom_histogram(binwidth = ...)`.
#' * `ppc_freqpoly()` / `ppc_freqpoly_grouped()`: forwarded to
#' `ggplot2::geom_area(stat = "bin", binwidth = ...)` (i.e. `stat = "bin"`).
#' * `ppc_dots()`: forwarded to `ggdist::stat_dots(binwidth = ...)`.
#' If `NULL`, the default binning behavior of the respective geom/stat is used.
#'
#' @param bins Integer number of bins. Forwarded in the same way as `binwidth`
#' (i.e. to `geom_histogram()`, `geom_area(stat = "bin")`, and `ggdist::stat_dots()`).
#'
#' @param breaks Numeric vector of break positions. Forwarded in the same way
#' as `binwidth`/`bins` (where supported).
#'
#' @param freq Logical. Controls whether the vertical scale shows counts
#' (`freq = TRUE`) or densities/proportions (`freq = FALSE`). This argument
#' is not limited to `ppc_hist()` — it is also used by `ppc_freqpoly()` and
#' `ppc_freqpoly_grouped()` because those functions use `stat="bin"`
#' (implemented via `geom_area(stat = "bin")`). The `freq` value is passed
#' to `set_hist_aes()` and affects the y aesthetic used by `geom_histogram()`
#' and `geom_area(stat="bin")`.
Comment on lines +20 to +40
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been documenting these using shared roxygen templates (see the files in the man-roxygen directory so that we can use them in different R files and reduce the amount of redundant text across the different files in the package. So I'm not sure that we want to have to document them separate in each R file. That said, I do like the extra detail here! So I have mixed feeling about this.

#'
#' @template details-binomial
#' @template return-ggplot-or-data
#'
#' @section Version & Dependencies:
#'
#' - Requires **ggplot2 (>= 3.4.0)** for the use of modern layering and
#' consistent handling of bin arguments.
#'
#' - Uses **ggdist::stat_dots()** (requires **ggdist >= 3.3.0**).
#'
#' - All `ppc_*` functions depend on **bayesplot** core plotting infrastructure
#' and `ggplot2` grammar of graphics.
#'
#' - Ensure that dependent packages (`ggplot2`, `ggdist`, and `bayesplot`)
#' are loaded or properly imported in your package’s `DESCRIPTION` file.
Comment on lines +47 to +56
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all true, but it should already be enforced by the DESCRIPTION file we have. That is, you can't install bayesplot without ggplot2 >= 3.4.0. Although I do notice that we're missing the ggdist version number, thanks for pointing that out!

I'm not sure all this text is worth adding to the documentation page since we would need to add a section like this to every R file in the package for consistency and this information is already included in the DESCRIPTION file.

#'
#' @section Plot Descriptions:
#' \describe{
#' \item{`ppc_hist(), ppc_freqpoly(), ppc_dens(), ppc_boxplot()`}{
Expand Down Expand Up @@ -62,6 +97,57 @@
#' See Säilynoja et al. (2021) for more details.}
#' }
#'
#' @section When to Use Each Plot:
#'
#' These plots offer different perspectives on how simulated (`yrep`) and observed (`y`) data compare.
#'
#' \itemize{
#' \item **`ppc_hist()`** — Best for visualizing **overall distribution shapes**.
#' Use this when your data are continuous and you want to see how model predictions align
#' with observed distributions.
#'
#' \item **`ppc_dots()`** — Ideal for **small or discrete datasets**.
#' Displays each simulated dataset as dots, giving an intuitive visual comparison
#' between simulated and observed values.
#'
#' \item **`ppc_freqpoly()`** — Useful for **moderate to large continuous datasets**.
#' Highlights how predicted and observed **frequencies vary** across the range of `y`.
#'
#' \item **`ppc_freqpoly_grouped()`** — When you have a **grouping variable** (e.g., categories),
#' use this to compare frequencies across levels of that variable.
#' }
Comment on lines +102 to +118
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of having a section like this! But I think this is missing some of the plots on the ppc-distributions page.

#'
#' @section Usage Note:
#'
#' - The `binwidth`, `bins`, and `breaks` arguments are passed to multiple
#' underlying `ggplot2` and `ggdist` geoms (`geom_histogram()`, `geom_area()`,
#' and `stat_dots()`), which means they affect both histogram and dot plot outputs.
#'
#' - The `freq` argument controls whether the y-axis represents **counts**
#' (`freq = TRUE`) or **densities** (`freq = FALSE`) for both histograms
#' and frequency polygons.
#'
#' - These plots assume numeric `y` values. For categorical data,
#' consider `ppc_bars()` or similar discrete comparison functions.
#'
#' - If a grouping variable is provided, ensure that `y` and `yrep` have
#' matching group structures; otherwise, plots may not align correctly.
#'
#' - For large replicated datasets (`yrep` with > 50 rows), consider
#' summarizing or subsampling to avoid overplotting.
#'
#' Note:
#' The `binwidth`, `bins`, and `breaks` arguments are forwarded not only
#' to `ggplot2::geom_histogram()` (used by `ppc_hist()`), but also to
#' `ggdist::stat_dots()` (used by `ppc_dots()`) and to the `stat = "bin"`
#' layers used by `ppc_freqpoly()` and `ppc_freqpoly_grouped()`
#' (implemented with `ggplot2::geom_area(stat = "bin", ...)`).
#'
#' Similarly, the `freq` argument affects not only histograms but also
#' frequency polygons created with `ppc_freqpoly()` and
#' `ppc_freqpoly_grouped()`, where it determines whether the y-axis
#' represents counts (`freq = TRUE`) or densities (`freq = FALSE`).
Comment on lines +120 to +149
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all good info but I think most of it is clear from the existing documentation and it adds a lot of length to already long documentation. For example, it should already be clear that the freq argument affects multiple different plots because they each have the freq argument. I'm not sure we need to write that out in sentence form.

I guess my main concern here is that users are just going get lost in a sea of text. So we should only put things in sentences/paragraphs that are hard to discern from the existing documentation. Does that make sense? If you disagree feel free to let me know! I really appreciate the effort you put into this.

#'
#' @template reference-vis-paper
#' @template reference-uniformity-test
#' @templateVar bdaRef (Ch. 6)
Expand Down
Loading