Skip to content

RMSE computation in glm uses link instead of response scale #125

@jhorzek

Description

@jhorzek

Hi,

Thank you for the great package, it was very helpful in a recent modeling endeavor of mine! I know this package is superseded, but I was surprised by one of the outputs for a Poisson regression and wanted to check if this is the desired behavior.

I was following the cross-validation example, but for a glm:

library(modelr)
library(purrr)
set.seed(123)
cv2 <- crossv_kfold(mtcars, 5)
models <- map(cv2$train, ~ glm(cyl ~ wt, 
                               data = .,
                               family = poisson()))
map2_dbl(models, cv2$test, rmse)
# 1        2        3        4        5 
# 4.180961 5.337407 4.895667 4.555588 4.186519

It seems that the rmse reported here is computed by taking the difference between the observed values and the predictions based on the link instead of the response scale:

map2_dbl(models, 
         cv2$test, ~
           sqrt(mean((modelr:::response(.x, .y) -
                        stats::predict(.x, .y, type = "link"))^2)))

# 1        2        3        4        5 
# 4.180961 5.337407 4.895667 4.555588 4.186519 

I get different results if I use the response scale:

map2_dbl(models, 
         cv2$test, ~
           sqrt(mean((modelr:::response(.x, .y) -
                        stats::predict(.x, .y, type = "response"))^2)))

# 1         2         3         4         5 
# 0.8622326 1.3480643 1.4131760 2.2894338 1.0397307

If I am not mistaken, this should always be on the response scale.

Best,
Jannik

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions