survival/ToDo at master · therneau/survival · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
Outside input appreciated:

Deal with variables that have spaces, e.g.
   survfit( Surv(time, status) ~ `bad idea for a name`, data=yourdata)
I never use such things so it has been low on my list, but some others do.  I think that the heart of the issue is in the strata() function, and may just need smarter use of text manipulation tools when building the strata label.  This overlaps with issue 232

Add include.lowest option to tcut(), to make it align with cut().

confint method for coxph objects that uses profile likelihood. Standard +-1.96*se intervals acually work very well for the Cox model, so I don't expect it to matter a lot, but it is a user request

Depricate survConcordance.  The replacement concordance function handles a wider range of models and has a superior variance.  But there are a bunch of packages that use it, and they will need coaxing/help with the conversion.

coxph still returns the Splus style assign attribute.  It's overdue to change.

If you fit a coxph model fit <- coxph(Surv(time, status) ~ x1 + x2, data1)
then predict(fit, newdata=data2), x1 is a factor, and the levels of the factor for data2 have values not in data1, this should give an clear error message.  This hasn't been checked out for all newdata cases, i.e., predict, survfit, and concordance.  Also test cases for when they are valid, but either a subset or in a different order.

-----

Bugs:
 cox.zph for a model with robust variance has incorrect confidence intervals on curves.  This is a result of a change over a year ago; the math behind this is still a bit opaque to me.

And see the list of open issues

------

Harder tasks:

Predict method for multi-state coxph model. The computational core is I think mostly simple, i.e., stacker() is needed to rebuild the X matrix.  Less clear is what a user would want the output to look like, which needs some thought.

residuals.survfit for curves from a multi-state Cox model.  This is needed for variance estimates.  (Hard, I think)

Proper variance for yates survival curve.  Think about a discussion with the emmeans package, whether they could inherit some of what yates tries to do.

Reliability models vignette.

Rewrite the penalized models code.

Add the design of the survival library document to vignettes

Add likeihood displacement residuals?
Cook and Lawless, Multistate models for life history data, page 104, just above
section 3.5.2.  States that
  $\hat \theta - \hat\theta_{(i)} \approx V U_i$
where $V$ is the Cox model variance.
This is my formula for the dfbeta residuals.  They also state that
 $LD_i = 2(loglik(\theta) - loglik(\theta_{(i)}) = U'_i V U_i $

This appears to be Han Van H's likelihood displacement residual and would lead
to the cross-validated log-likelhood?   They are talking about parametric
models though.


Notes from Pam Shaw about survfit:

I did also experience some confusion about how plot.survfit was working and found at least an inconsistency in how it works. Under the help ?plot.survfit since there is a conf.type argument, I assumed that meant the user can specify the confidence interval type in the plot statement that would appear on the plot, when in fact the plotted confidence interval for the survfit object will only be the one specified by the conf.type statement given to  survfit when it created the survfit object.  I was in fact thinking I was changing it by using this plot option, until I looked carefully and realized I needed survfit to do that.  The other “inconsistency” I noticed is if you happened to specify “none” in the survfit statement, then what plot.survfit would plot is the default confidence interval, only it isnt calculated the same as when you actually would have specified the default method conf.type=“log” in the survfit statement.   I think what is going on is when you specify  conf.type=“log” in  survfit, the confidence limits will be truncated to be in [0,1]; however, when you specify conf.type=“none” the confidence limits can fall outside this range.  At least that is how the plots appear. Anyhoo, I thought I would pass this along in case aesthetically you would want the plot of the default CI method always be the same.

Finally, in the plot.survfit help, if you read what is written about conf.type, it isn’t actually correct. I think this text came from the survfit.formula where the first option listed was “none” and in plot.survfit, the first option is “plain”, so the description below is off and its missing “arcsin".
conf.type	One of "plain", "log" (the default), "log-log" or "logit". Only enough of the string to uniquely identify it is necessary. The first option causes confidence intervals not to be generated. The second causes the standard intervals curve +- k *se(curve), where k is determined from conf.int. The log option calculates intervals based on the cumulative hazard or log(survival). The log-log option bases the intervals on the log hazard or log(-log(survival)), and the logit option on log(survival/(1-survival)).