-
-
Notifications
You must be signed in to change notification settings - Fork 86
fix ppc-distribution.R documentation #376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Documentation done on binwidth, bins, breaks and freq+ added a clarifying sentence. Added @section When to Use Each Plot: This new section gives a quick, practical overview of when and why to use each plot type Also added Usage Note and Version & Dependency. For cleaner, easier-to-maintain, and more transparent for users
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AniPatil101 sorry for the delay in reviewing this. I really appreciate the pull request and the effort you put into this!
I have a few concerns that I put in the review comments because this adds a lot of additional text to an already long documentation page and I think some of the info can already be found elsewhere (but some of it is definitely good to add!). And it would make this documentation page different than the others, so if we end up adding any of these new sections I think it would make sense to add them to all the pages. If you're interested in working on that it could be a project that you chip away at over time (I know there are a lot of doc pages) and then we merge it once it's ready. But no obligation of course!
Anyway, I really appreciate the PR and happy to talk more about this if you end up wanting to keep working on it.
| #' @param binwidth Numeric scalar; passed to the underlying geom/stat used | ||
| #' by the plotting function. Behavior by function: | ||
| #' * `ppc_hist()`: forwarded to `ggplot2::geom_histogram(binwidth = ...)`. | ||
| #' * `ppc_freqpoly()` / `ppc_freqpoly_grouped()`: forwarded to | ||
| #' `ggplot2::geom_area(stat = "bin", binwidth = ...)` (i.e. `stat = "bin"`). | ||
| #' * `ppc_dots()`: forwarded to `ggdist::stat_dots(binwidth = ...)`. | ||
| #' If `NULL`, the default binning behavior of the respective geom/stat is used. | ||
| #' | ||
| #' @param bins Integer number of bins. Forwarded in the same way as `binwidth` | ||
| #' (i.e. to `geom_histogram()`, `geom_area(stat = "bin")`, and `ggdist::stat_dots()`). | ||
| #' | ||
| #' @param breaks Numeric vector of break positions. Forwarded in the same way | ||
| #' as `binwidth`/`bins` (where supported). | ||
| #' | ||
| #' @param freq Logical. Controls whether the vertical scale shows counts | ||
| #' (`freq = TRUE`) or densities/proportions (`freq = FALSE`). This argument | ||
| #' is not limited to `ppc_hist()` — it is also used by `ppc_freqpoly()` and | ||
| #' `ppc_freqpoly_grouped()` because those functions use `stat="bin"` | ||
| #' (implemented via `geom_area(stat = "bin")`). The `freq` value is passed | ||
| #' to `set_hist_aes()` and affects the y aesthetic used by `geom_histogram()` | ||
| #' and `geom_area(stat="bin")`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've been documenting these using shared roxygen templates (see the files in the man-roxygen directory so that we can use them in different R files and reduce the amount of redundant text across the different files in the package. So I'm not sure that we want to have to document them separate in each R file. That said, I do like the extra detail here! So I have mixed feeling about this.
| #' - Requires **ggplot2 (>= 3.4.0)** for the use of modern layering and | ||
| #' consistent handling of bin arguments. | ||
| #' | ||
| #' - Uses **ggdist::stat_dots()** (requires **ggdist >= 3.3.0**). | ||
| #' | ||
| #' - All `ppc_*` functions depend on **bayesplot** core plotting infrastructure | ||
| #' and `ggplot2` grammar of graphics. | ||
| #' | ||
| #' - Ensure that dependent packages (`ggplot2`, `ggdist`, and `bayesplot`) | ||
| #' are loaded or properly imported in your package’s `DESCRIPTION` file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all true, but it should already be enforced by the DESCRIPTION file we have. That is, you can't install bayesplot without ggplot2 >= 3.4.0. Although I do notice that we're missing the ggdist version number, thanks for pointing that out!
I'm not sure all this text is worth adding to the documentation page since we would need to add a section like this to every R file in the package for consistency and this information is already included in the DESCRIPTION file.
| #' @section Usage Note: | ||
| #' | ||
| #' - The `binwidth`, `bins`, and `breaks` arguments are passed to multiple | ||
| #' underlying `ggplot2` and `ggdist` geoms (`geom_histogram()`, `geom_area()`, | ||
| #' and `stat_dots()`), which means they affect both histogram and dot plot outputs. | ||
| #' | ||
| #' - The `freq` argument controls whether the y-axis represents **counts** | ||
| #' (`freq = TRUE`) or **densities** (`freq = FALSE`) for both histograms | ||
| #' and frequency polygons. | ||
| #' | ||
| #' - These plots assume numeric `y` values. For categorical data, | ||
| #' consider `ppc_bars()` or similar discrete comparison functions. | ||
| #' | ||
| #' - If a grouping variable is provided, ensure that `y` and `yrep` have | ||
| #' matching group structures; otherwise, plots may not align correctly. | ||
| #' | ||
| #' - For large replicated datasets (`yrep` with > 50 rows), consider | ||
| #' summarizing or subsampling to avoid overplotting. | ||
| #' | ||
| #' Note: | ||
| #' The `binwidth`, `bins`, and `breaks` arguments are forwarded not only | ||
| #' to `ggplot2::geom_histogram()` (used by `ppc_hist()`), but also to | ||
| #' `ggdist::stat_dots()` (used by `ppc_dots()`) and to the `stat = "bin"` | ||
| #' layers used by `ppc_freqpoly()` and `ppc_freqpoly_grouped()` | ||
| #' (implemented with `ggplot2::geom_area(stat = "bin", ...)`). | ||
| #' | ||
| #' Similarly, the `freq` argument affects not only histograms but also | ||
| #' frequency polygons created with `ppc_freqpoly()` and | ||
| #' `ppc_freqpoly_grouped()`, where it determines whether the y-axis | ||
| #' represents counts (`freq = TRUE`) or densities (`freq = FALSE`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all good info but I think most of it is clear from the existing documentation and it adds a lot of length to already long documentation. For example, it should already be clear that the freq argument affects multiple different plots because they each have the freq argument. I'm not sure we need to write that out in sentence form.
I guess my main concern here is that users are just going get lost in a sea of text. So we should only put things in sentences/paragraphs that are hard to discern from the existing documentation. Does that make sense? If you disagree feel free to let me know! I really appreciate the effort you put into this.
| #' These plots offer different perspectives on how simulated (`yrep`) and observed (`y`) data compare. | ||
| #' | ||
| #' \itemize{ | ||
| #' \item **`ppc_hist()`** — Best for visualizing **overall distribution shapes**. | ||
| #' Use this when your data are continuous and you want to see how model predictions align | ||
| #' with observed distributions. | ||
| #' | ||
| #' \item **`ppc_dots()`** — Ideal for **small or discrete datasets**. | ||
| #' Displays each simulated dataset as dots, giving an intuitive visual comparison | ||
| #' between simulated and observed values. | ||
| #' | ||
| #' \item **`ppc_freqpoly()`** — Useful for **moderate to large continuous datasets**. | ||
| #' Highlights how predicted and observed **frequencies vary** across the range of `y`. | ||
| #' | ||
| #' \item **`ppc_freqpoly_grouped()`** — When you have a **grouping variable** (e.g., categories), | ||
| #' use this to compare frequencies across levels of that variable. | ||
| #' } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of having a section like this! But I think this is missing some of the plots on the ppc-distributions page.
Documentation done on binwidth, bins, breaks and freq+ added a clarifying sentence.
Added section When to Use Each Plot:
Also added Usage Note and Version & Dependency.