Skip to content

Conversation

@fweber144
Copy link
Collaborator

@fweber144 fweber144 commented Jul 26, 2025

This adds support for censored observations (i.e., survival or time-to-event analysis) when using the latent projection. Originally, #2 already mentioned that support for time-to-event models would be a nice feature, and the request came up again recently in https://discourse.mc-stan.org/t/using-projpred-latent-projection-with-brms-weibull-family-models/39275. This PR only makes it possible to use the latent projection for time-to-event models. For a more general perspective, see #2.

The solution presented here is not optimal because it would probably be more desirable to allow for an extension of the formula in init_refmodel() (similarly to brms's resp_cens() term), see https://discourse.mc-stan.org/t/using-projpred-latent-projection-with-brms-weibull-family-models/39275/10. Via such an extension of the init_refmodel() formula, we could also handle observation weights in a better way.

In contrast to the less elegant solution suggested in https://discourse.mc-stan.org/t/using-projpred-latent-projection-with-brms-weibull-family-models/39275/10, the solution presented here requires users to add a right-hand side formula as an attribute called cens_var to the latent_ll_oscale function. The variable named in that formula is then retrieved (internally, whenever calling the latent_ll_oscale function) from the original dataset (possibly subsetted to the observations from a given test set), newdata, or element data from varsel()'s argument d_test, whichever is applicable in the specific situation where the latent_ll_oscale function is called. The content of the retrieved variable is then passed to argument cens of the latent_ll_oscale function. In other words, the variable mentioned in the cens_var formula needs to contain the censoring indicators (e.g., 0 = uncensored, 1 = censored) which can then be used (by the user) within the latent_ll_oscale function. On the Stan Forums, I will update the Weibull (and perhaps also the log-normal) example from https://discourse.mc-stan.org/t/using-projpred-latent-projection-with-brms-weibull-family-models/39275 to illustrate this.

The solution presented here is still less elegant than the brms::resp_cens()-like solution, but in my opinion, it is preferable over the less elegant solution I suggested in https://discourse.mc-stan.org/t/using-projpred-latent-projection-with-brms-weibull-family-models/39275/10. Via the cens_var attribute, we also catch the censored-observations case (when using the latent projection) in validate_vsel_object_stats() to throw a warning if a performance statistic other than "elpd", "mlpd", or "gmpd" is used.

This is only a draft PR. Still to do:

@avehtari: A few weeks ago, we discussed how we should make this feature available and we decided to keep this feature in a separate branch. Thinking over this again, I would say we could merge this into master as soon as the to-do's from above are completed. The reason is that I consider the implementation presented here not as "hacky" as the solution we discussed back then (which was the less elegant solution I suggested in https://discourse.mc-stan.org/t/using-projpred-latent-projection-with-brms-weibull-family-models/39275/10). But I'll leave the decision up to you and @aloctavodia what you want to do with this PR. Unfortunately, I was not able to request a review from @aloctavodia for this PR.

@fweber144 fweber144 requested a review from avehtari July 26, 2025 20:09
@fweber144
Copy link
Collaborator Author

On the Stan Forums, I will update the Weibull (and perhaps also the log-normal) example from https://discourse.mc-stan.org/t/using-projpred-latent-projection-with-brms-weibull-family-models/39275 to illustrate this.

Added now at https://discourse.mc-stan.org/t/using-projpred-latent-projection-with-brms-weibull-family-models/39275/23 (for the Weibull case).

(add argument `cens` to internal `latent_ll_oscale` functions)
@fweber144
Copy link
Collaborator Author

fweber144 commented Jul 27, 2025

I forgot to add the new argument cens to internal latent_ll_oscale functions, but this is now done in commit 6b76269. The tests are running through now (i.e., they don't result in errors).

@avehtari
Copy link
Member

Thinking over this again, I would say we could merge this into master as soon as the to-do's from above are completed. The reason is that I consider the implementation presented here not as "hacky" as the solution we discussed back then

OK for me to merge when it's ready

@fweber144
Copy link
Collaborator Author

@avehtari do you know someone who could finish this PR (the four to-do's listed above)?

@fweber144
Copy link
Collaborator Author

@aloctavodia
Copy link
Collaborator

I could give it a try. Do you prefer to merge this and do the rest in a separated PR? Or add everything here?

@fweber144
Copy link
Collaborator Author

I think I would prefer to add everything here so this PR is self-contained.

@fweber144
Copy link
Collaborator Author

@aloctavodia I saw you opened PR #537 and there updated the package version to a release version (i.e., a non-development version). Does that mean you are planning a CRAN release soon? If yes, do you think it would be possible to include this PR in that CRAN release?

@aloctavodia
Copy link
Collaborator

As the main goal of the PR (and release) is to unlock the publication of the package mclogit that our tests are currently blocking, I would prefer to do the release ASAP. I am sorry that I delayed working on this. I will end this PR next week, and then I also want to work on making the test more robust, as I have been having issues running them locally.

@fweber144
Copy link
Collaborator Author

Ok, I understand. The tests are running through on my machine without problems, but we've realized at several occasions in the past that the tests are quite instable as they depend on rstanarm and brms models being fitted at their beginning. That's why I deactivated most of the tests on CRAN (in #537, it looks like you modified tests that are indeed run on CRAN, so I understand that a new version of mclogit might face problems when running our tests). Unfortunately, I haven't had the time to revise our tests to make them more robust during the time when I was working on projpred. Thank you for working on this now.

censored-observations examples

First, we change the censoring times to avoid
problems in the log-normal example (there, it seems we had too many censored
observations).

Second, we change the true `sdlog` parameter to avoid
problems in the log-normal example (there, it seems we had a too right-skewed
distribution).

Third, we remove the `set.seed()` call because we already have one at the
beginning of the vignette.
… `R CMD check`:

```
process chunk of elements [...] is [...] MiB. This exceeds the maximum allowed
size 500.00 MiB per by R option "future.globals.maxSize". This limit is set to
protect against transfering too large objects to parallel workers by mistake,
which may not be intended and could be costly. See
help("future.globals.maxSize", package = "future") for further explainations and
how to adjust or remove this threshold The three largest globals are 'one_obs'
([...] MiB of class 'function'), 'refmodel' ([...] MiB of class 'list') and 'lw'
([...] MiB of class 'numeric')
```
Hence, switch back to the `doParallel` backend.
(this is only a very basic test for `latent_ll_oscale` because we have more
comprehensive examples with simulated data in the latent-projection vignette and
they are run when `R CMD check` is run)
…unctions

(to check that arguments added after the initial latent-projection release
(v2.4.0) have indeed been added (or taken into account via `...`) by users)
@fweber144
Copy link
Collaborator Author

I just pushed commits completing this PR. From my side, it is ready to be merged, but check it if you like.

I refrained from updating the package description in the DESCRIPTION file because I prefer the general wording that we currently have and because Catalina, Bürkner, and Vehtari (2021, https://doi.org/10.48550/arXiv.2109.04702) already deal with time-to-event models, so the existing reference to that preprint should already be a sufficient reference to time-to-event models.

Note that roxygen2 version 7.3.3 seems to have a bug that replaces cross-links to summary() by cross-links to cmdstanr::summary(). I unstaged these changes after each re-documentation, but this is a bit cumbersome. If you prefer, you can always switch back to roxygen2 v7.3.2 (personally, I have installed v7.3.3 and didn't want to downgrade for this PR).

R CMD check runs through on my machine (with R 4.5.1).

@fweber144 fweber144 marked this pull request as ready for review November 1, 2025 05:09
@fweber144
Copy link
Collaborator Author

Thanks for approving, @aloctavodia.

@avehtari, do you want to review this PR, too?

Copy link
Member

@avehtari avehtari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the documentation and left a couple questions


## Major changes

* Added support for censored observations when using the latent projection (with response-scale analyses). This makes it possible, e.g., to use the latent projection for time-to-event models (also known as models for survival analysis). Due to this new feature, the function passed to argument `latent_ll_oscale` of `extend_family()` now needs to have an argument `cens`. See `?extend_family` (section "Latent projection") as well as the latent-projection vignette (section ["Censored observations (survival analysis)"](https://mc-stan.org/projpred/articles/latent.html#cens)) for more information and examples. Note that only the performance statistics `"elpd"`, `"mlpd"`, and `"gmpd"` take censoring into account (on response scale). (GitHub: #528)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to this new feature, the function passed to argument latent_ll_oscale of extend_family() now needs to have an argument cens.

Does this possibly break existing code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you run reverse-dependency checks?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I did. They succeeded. (brms and BayesERtools both do not specify a latent_ll_oscale function, so they are not affected by this PR.)

#' * `cens` accepts a vector containing censoring indicators for the
#' observations for which to calculate the response-scale log-likelihood values
#' (i.e., for the observations from the second dimension of `ilpreds`). This is
#' only relevant if attribute `cens_var` of `latent_ll_oscale` is not `NULL`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should cens have some default value? So that there is no need to provide it if it's not relevant? Or am I misreading the code and documentation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not need to have a default value because all calls to latent_ll_oscale specify cens. Users could assign a default value if they want, but it would not be used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify the documentation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean to add something like "Argument cens does not need to have a default value."?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused because first there is

#' The function supplied to argument latent_ll_oscale needs to have the
#' prototype
#' ```{r, eval = FALSE}
#' latent_ll_oscale(ilpreds, dis, y_oscale, wobs = rep(1, ncol(ilpreds)),
#' cens, cl_ref, wdraws_ref = rep(1, length(cl_ref)))

So I have to have it. But then there is

#' * cens accepts a vector containing censoring indicators for the
#' observations for which to calculate the response-scale log-likelihood values
This is
#' only relevant if attribute cens_var of latent_ll_oscale is not NULL

So I don't need to have it? But if I use latent_ll_oscale without and with cens how does it work without default value? Why not have default as NULL?

As I don't understand the current documentation and the logic of the code, I don't know what to add there, but it does need some clarification.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, the part

This is only relevant if attribute cens_var of latent_ll_oscale is not NULL [...]

might be confusing. Argument cens is always needed (otherwise, R would throw an error because we would be calling a function with an argument that the function does not have), but it does not need to have a default value (because we always specify cens when calling the latent_ll_oscale function). So I would propose to simply omit the sentence "This is only relevant if attribute cens_var of latent_ll_oscale is not NULL [...]". Is that ok for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants