Hi,
Thank you for the great package, it was very helpful in a recent modeling endeavor of mine! I know this package is superseded, but I was surprised by one of the outputs for a Poisson regression and wanted to check if this is the desired behavior.
I was following the cross-validation example, but for a glm:
library(modelr)
library(purrr)
set.seed(123)
cv2 <- crossv_kfold(mtcars, 5)
models <- map(cv2$train, ~ glm(cyl ~ wt,
data = .,
family = poisson()))
map2_dbl(models, cv2$test, rmse)
# 1 2 3 4 5
# 4.180961 5.337407 4.895667 4.555588 4.186519
It seems that the rmse reported here is computed by taking the difference between the observed values and the predictions based on the link instead of the response scale:
map2_dbl(models,
cv2$test, ~
sqrt(mean((modelr:::response(.x, .y) -
stats::predict(.x, .y, type = "link"))^2)))
# 1 2 3 4 5
# 4.180961 5.337407 4.895667 4.555588 4.186519
I get different results if I use the response scale:
map2_dbl(models,
cv2$test, ~
sqrt(mean((modelr:::response(.x, .y) -
stats::predict(.x, .y, type = "response"))^2)))
# 1 2 3 4 5
# 0.8622326 1.3480643 1.4131760 2.2894338 1.0397307
If I am not mistaken, this should always be on the response scale.
Best,
Jannik
Hi,
Thank you for the great package, it was very helpful in a recent modeling endeavor of mine! I know this package is superseded, but I was surprised by one of the outputs for a Poisson regression and wanted to check if this is the desired behavior.
I was following the cross-validation example, but for a glm:
It seems that the rmse reported here is computed by taking the difference between the observed values and the predictions based on the link instead of the response scale:
I get different results if I use the response scale:
If I am not mistaken, this should always be on the response scale.
Best,
Jannik