-
-
Notifications
You must be signed in to change notification settings - Fork 86
fix ppc-distribution.R documentation #376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -17,9 +17,44 @@ | |
| #' the predictive distributions. | ||
| #' @param ... For dot plots, optional additional arguments to pass to [ggdist::stat_dots()]. | ||
| #' | ||
| #' @param binwidth Numeric scalar; passed to the underlying geom/stat used | ||
| #' by the plotting function. Behavior by function: | ||
| #' * `ppc_hist()`: forwarded to `ggplot2::geom_histogram(binwidth = ...)`. | ||
| #' * `ppc_freqpoly()` / `ppc_freqpoly_grouped()`: forwarded to | ||
| #' `ggplot2::geom_area(stat = "bin", binwidth = ...)` (i.e. `stat = "bin"`). | ||
| #' * `ppc_dots()`: forwarded to `ggdist::stat_dots(binwidth = ...)`. | ||
| #' If `NULL`, the default binning behavior of the respective geom/stat is used. | ||
| #' | ||
| #' @param bins Integer number of bins. Forwarded in the same way as `binwidth` | ||
| #' (i.e. to `geom_histogram()`, `geom_area(stat = "bin")`, and `ggdist::stat_dots()`). | ||
| #' | ||
| #' @param breaks Numeric vector of break positions. Forwarded in the same way | ||
| #' as `binwidth`/`bins` (where supported). | ||
| #' | ||
| #' @param freq Logical. Controls whether the vertical scale shows counts | ||
| #' (`freq = TRUE`) or densities/proportions (`freq = FALSE`). This argument | ||
| #' is not limited to `ppc_hist()` — it is also used by `ppc_freqpoly()` and | ||
| #' `ppc_freqpoly_grouped()` because those functions use `stat="bin"` | ||
| #' (implemented via `geom_area(stat = "bin")`). The `freq` value is passed | ||
| #' to `set_hist_aes()` and affects the y aesthetic used by `geom_histogram()` | ||
| #' and `geom_area(stat="bin")`. | ||
| #' | ||
| #' @template details-binomial | ||
| #' @template return-ggplot-or-data | ||
| #' | ||
| #' @section Version & Dependencies: | ||
| #' | ||
| #' - Requires **ggplot2 (>= 3.4.0)** for the use of modern layering and | ||
| #' consistent handling of bin arguments. | ||
| #' | ||
| #' - Uses **ggdist::stat_dots()** (requires **ggdist >= 3.3.0**). | ||
| #' | ||
| #' - All `ppc_*` functions depend on **bayesplot** core plotting infrastructure | ||
| #' and `ggplot2` grammar of graphics. | ||
| #' | ||
| #' - Ensure that dependent packages (`ggplot2`, `ggdist`, and `bayesplot`) | ||
| #' are loaded or properly imported in your package’s `DESCRIPTION` file. | ||
|
Comment on lines
+47
to
+56
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is all true, but it should already be enforced by the DESCRIPTION file we have. That is, you can't install bayesplot without ggplot2 >= 3.4.0. Although I do notice that we're missing the ggdist version number, thanks for pointing that out! I'm not sure all this text is worth adding to the documentation page since we would need to add a section like this to every R file in the package for consistency and this information is already included in the DESCRIPTION file. |
||
| #' | ||
| #' @section Plot Descriptions: | ||
| #' \describe{ | ||
| #' \item{`ppc_hist(), ppc_freqpoly(), ppc_dens(), ppc_boxplot()`}{ | ||
|
|
@@ -62,6 +97,57 @@ | |
| #' See Säilynoja et al. (2021) for more details.} | ||
| #' } | ||
| #' | ||
| #' @section When to Use Each Plot: | ||
| #' | ||
| #' These plots offer different perspectives on how simulated (`yrep`) and observed (`y`) data compare. | ||
| #' | ||
| #' \itemize{ | ||
| #' \item **`ppc_hist()`** — Best for visualizing **overall distribution shapes**. | ||
| #' Use this when your data are continuous and you want to see how model predictions align | ||
| #' with observed distributions. | ||
| #' | ||
| #' \item **`ppc_dots()`** — Ideal for **small or discrete datasets**. | ||
| #' Displays each simulated dataset as dots, giving an intuitive visual comparison | ||
| #' between simulated and observed values. | ||
| #' | ||
| #' \item **`ppc_freqpoly()`** — Useful for **moderate to large continuous datasets**. | ||
| #' Highlights how predicted and observed **frequencies vary** across the range of `y`. | ||
| #' | ||
| #' \item **`ppc_freqpoly_grouped()`** — When you have a **grouping variable** (e.g., categories), | ||
| #' use this to compare frequencies across levels of that variable. | ||
| #' } | ||
|
Comment on lines
+102
to
+118
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like the idea of having a section like this! But I think this is missing some of the plots on the ppc-distributions page. |
||
| #' | ||
| #' @section Usage Note: | ||
| #' | ||
| #' - The `binwidth`, `bins`, and `breaks` arguments are passed to multiple | ||
| #' underlying `ggplot2` and `ggdist` geoms (`geom_histogram()`, `geom_area()`, | ||
| #' and `stat_dots()`), which means they affect both histogram and dot plot outputs. | ||
| #' | ||
| #' - The `freq` argument controls whether the y-axis represents **counts** | ||
| #' (`freq = TRUE`) or **densities** (`freq = FALSE`) for both histograms | ||
| #' and frequency polygons. | ||
| #' | ||
| #' - These plots assume numeric `y` values. For categorical data, | ||
| #' consider `ppc_bars()` or similar discrete comparison functions. | ||
| #' | ||
| #' - If a grouping variable is provided, ensure that `y` and `yrep` have | ||
| #' matching group structures; otherwise, plots may not align correctly. | ||
| #' | ||
| #' - For large replicated datasets (`yrep` with > 50 rows), consider | ||
| #' summarizing or subsampling to avoid overplotting. | ||
| #' | ||
| #' Note: | ||
| #' The `binwidth`, `bins`, and `breaks` arguments are forwarded not only | ||
| #' to `ggplot2::geom_histogram()` (used by `ppc_hist()`), but also to | ||
| #' `ggdist::stat_dots()` (used by `ppc_dots()`) and to the `stat = "bin"` | ||
| #' layers used by `ppc_freqpoly()` and `ppc_freqpoly_grouped()` | ||
| #' (implemented with `ggplot2::geom_area(stat = "bin", ...)`). | ||
| #' | ||
| #' Similarly, the `freq` argument affects not only histograms but also | ||
| #' frequency polygons created with `ppc_freqpoly()` and | ||
| #' `ppc_freqpoly_grouped()`, where it determines whether the y-axis | ||
| #' represents counts (`freq = TRUE`) or densities (`freq = FALSE`). | ||
|
Comment on lines
+120
to
+149
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is all good info but I think most of it is clear from the existing documentation and it adds a lot of length to already long documentation. For example, it should already be clear that the I guess my main concern here is that users are just going get lost in a sea of text. So we should only put things in sentences/paragraphs that are hard to discern from the existing documentation. Does that make sense? If you disagree feel free to let me know! I really appreciate the effort you put into this. |
||
| #' | ||
| #' @template reference-vis-paper | ||
| #' @template reference-uniformity-test | ||
| #' @templateVar bdaRef (Ch. 6) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've been documenting these using shared roxygen templates (see the files in the
man-roxygendirectory so that we can use them in different R files and reduce the amount of redundant text across the different files in the package. So I'm not sure that we want to have to document them separate in each R file. That said, I do like the extra detail here! So I have mixed feeling about this.