REML vs. ML and a small number of groups in mixed models comparisons

Interesting post on mixed models in different languages! 

One thing to note is that 3 groups is a very small number of levels for a blocking variable, see e.g. [here](https://www.muscardinus.be/2018/09/number-random-effect-levels/) and [here](https://nextjournal.com/dmbates/complexity-in-fitting-linear-mixed-models/). There are cases where you would use that construction, but more for shrinkage / ["new style" random effects](http://www.biostat.umn.edu/~hodges/PubH8492/Hodges-ClaytonREONsubToStatSci.pdf) than estimates of the between-group variance. In other words, the estimate for between-group variance is going to be not great so I would advise against focusing on it.

Three groups is also so small that the difference between the REML and ML criterion is going to be quite large. You think of this a bit like Bessel's correction (n-1 instead of n in the denominator) for variances. For small samples, the effect is going to be huge, and that's exactly what's happening here -- lme4 and Python's statsmodels both default to REML, while MixedModels defaults to ML. (I'm with @dmbates here and have grown somewhat opposed to REML on purely mathematical-philisophical grounds and think that if you're looking at a sample small enough for the difference in the estimated variance to be substantive, then you should probably avoid overinterpreting that estimate.) MixedModels.jl can nonetheless use the REML criterion with the `REML=true` keyword argument.

For the GLMM, there's no REML option to worry about, but setting `nAGQ=0` in R is indeed essentially equivalent to MixedModels.jl's `fast=true` option. To be more specific, the fixed effects coefficients will be estimated as part of the PIRLS step instead of via the nonlinear optimizer, which reduces the size of the space the nonlinear optimizer needs to explore at the expensive of being slightly less accurate. (Only "essentially" equivalent because there are other unrelated algorithmic improvements in MixedModels.jl which mean that it's not exactly equivalent.)

I think it's kinda cool that MixedModels.jl with `fast=false` is nearly as fast as R with `nAQG=0`. :wink: 

Finally, if you want take advantage of MixedModels.jl but still have access to the whole R ecosystem of extra packages for examining and plotting mixed models, I'll do some shameless self advertising and suggest taking a peek at [JellyMe4.jl](https://github.com/palday/JellyMe4.jl).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REML vs. ML and a small number of groups in mixed models comparisons #9

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

REML vs. ML and a small number of groups in mixed models comparisons #9

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions