-
Notifications
You must be signed in to change notification settings - Fork 12
07. Chapter 7
Chapter 7 – Uncertainties Due to Within-Species Variation in Comparative Studies: Measurement Errors and Statistical Weights
brain size and body size on primates ("primate_ind.txt", a tab separated text file), with more than one observation for the majority of species, individual-specific data [download] - right click, Save as…
brain size and body size on primates ("primate_spec.txt", a tab separated text file), species-specific data obtained from the exercises above, includes information on within-species sampling (sample sizes, variances and standard errors [download] - right click, Save as…
phylogeny ("primate_tree.phy", in phylip format) is taken from 10kTrees (Arnold et al 2010) and tailored to the data at hand [download] - right click, Save as…
GLSME function ("GLSME.R", source file), GLSME function by K. Bartoszek [download] - right click, Save as…
"lme4" (Bates et al 2011)
"rptR" (Nakagawa and Schielzeth 2010)
"MCMCglmm" (Hadfield 2010)
brain size and body size on primates ("primate_ind.txt", a tab separated text file), with more than one observation for the majority of species, individual-specific data
Import, transform and check individual specific data:
data.ind = read.table("primate_ind.txt", sep = "\t", header = TRUE)
data.ind$lg_body = log(data.ind$body)
data.ind$lg_brain = log(data.ind$brain)
str(data.ind)
## 'data.frame': 894 obs. of 5 variables:
## $ species : Factor w/ 86 levels "Allenopithecus_nigroviridis",..: 1 2 3 3 3 3 3 3 3 3 ...
## $ body : num 2920 826 777 2310 4309 ...
## $ brain : num 57 45 42.5 52 54.2 ...
## $ lg_body : num 7.98 6.72 6.66 7.75 8.37 ...
## $ lg_brain: num 4.04 3.81 3.75 3.95 3.99 ...
May not compatible with the most recent version of R (need earlier releases to run the scripts below)
library(rptR, quietly = T)
rpt(data.ind$lg_brain, data.ind$species, datatype = "Gaussian", method = "REML")
##
## Repeatability calculation using the LMM.REML method
##
## R = 0.985
## SE = 0.002
## CI = [0.979, 0.989]
## P = 0 []
## 0.001 []
rpt(data.ind$lg_body, data.ind$species, datatype = "Gaussian", method = "ANOVA")
##
## Repeatability calculation using the ANOVA method
##
## R = 0.884
## SE = 0.017
## CI = [0.851, 0.917]
## P = 0 []
## 0.001 []
To estimate repeatability directly from a mixed model
library(lme4, quietly = T)
mod = lmer(lg_brain ~ 1 + (1 | species), data.ind)
attr(lme4::VarCorr(mod)$species, "stddev")^2/(attr(lme4::VarCorr(mod)$species,
"stddev")^2 + attr(lme4::VarCorr(mod), "sc")^2)
## (Intercept)
## 0.9854
To obtain P value based on randomization
null = c()
for (i in 1:1000) {
species_random = sample(data.ind$species, length(data.ind$species))
null[i] = anova(lmer(lg_brain ~ 1 + (1 | species_random), data.ind), mod,
test = "Chisq")$Pr[2]
}
sum(null > 0.05)/length(null)
## [1] 0
To get 95% confidence intervals based on parametric bootstrap
rvalues <- numeric()
for (i in 1:1000) {
y = simulate(mod)
mboot <- lmer(y[, 1] ~ 1 + (1 | species), data.ind)
rvalues[i] = attr(lme4::VarCorr(mboot)$species, "stddev")^2/(attr(lme4::VarCorr(mboot)$species,
"stddev")^2 + attr(lme4::VarCorr(mboot), "sc")^2)
}
quantile(rvalues, c(0.025, 0.975))
## 2.5% 97.5%
## 0.9798 0.9891
library(MCMCglmm, quietly = T)
prior <- list(R = list(V = 1, nu = 0.002), G = list(G1 = list(V = 1, nu = 0.002)))
mod.mcmc = MCMCglmm(lg_body ~ 1, random = ~species, data = data.ind, prior = prior,
verbose = F)
Rvalue = mod.mcmc$VCV[, 1]/(mod.mcmc$VCV[, 1] + mod.mcmc$VCV[, 2])
mean(Rvalue)
## [1] 0.8513
quantile(Rvalue, c(0.025, 0.975))
## 2.5% 97.5%
## 0.8086 0.8896
- Bates D, Maechler M, Bolker B (2011) lme4: Linear mixed-effects models using S4 classes. R package version 0.999375-42/r1414.
- Hadfield JD (2010) MCMC methods for multi-response generalized linear mixed models: the MCMCglmm R package. Journal of Statistical Software 33: 1-22
- Nakagawa S, Schielzeth H (2010) Repeatability for Gaussian and non-Gaussian data: a practical guide for biologists. Biological Reviews 85: 935-956
"ape" (Paradis et al 2004)
brain size and body size on primates ("primate_ind.txt", a tab separated text file), with more than one observation for the majority of species, individual-specific data
phylogeny ("primate_tree.phy", in phylip format) is taken from 10kTrees (Arnold et al 2010) and tailored to the data at hand
To obtain species-specific data (means and variances) from individual-specific data
data.ind = read.table("primate_ind.txt", sep = "\t", header = TRUE)
body_mean = tapply(data.ind$body, data.ind$species, mean)
brain_mean = tapply(data.ind$brain, data.ind$species, mean)
body_N = tapply(data.ind$body, data.ind$species, length)
brain_N = tapply(data.ind$brain, data.ind$species, length)
body_var = tapply(data.ind$body, data.ind$species, var)
brain_var = tapply(data.ind$brain, data.ind$species, var)
se <- function(x) sqrt(var(x)/length(x))
body_se <- tapply(data.ind$body, data.ind$species, se)
brain_se <- tapply(data.ind$brain, data.ind$species, se)
To calculate measurement errors based on pooled CV
body_cv = sqrt(body_var)/body_mean
body_cv.pool = sum(body_cv * body_N, na.rm = T)/sum(body_N, na.rm = T)
body_se.adj = (body_cv.pool * body_mean)/sqrt(body_N)
body_var.adj = (body_cv.pool * body_mean)^2
brain_cv = sqrt(brain_var)/brain_mean
brain_cv.pool = sum(brain_cv * brain_N, na.rm = T)/sum(brain_N, na.rm = T)
brain_se.adj = (brain_cv.pool * brain_mean)/sqrt(brain_N)
brain_var.adj = (brain_cv.pool * brain_mean)^2
To log-transform data
lg.body_var.adj = log(body_var.adj/(body_mean^2) + 1)
# note that this gives the same value for all species use the following
# instead:
lg.body_var.adj = log(body_se.adj^2/(body_mean^2) + 1)
lg.body = log(body_mean) - 0.5 * lg.body_var.adj
lg.brain_var.adj = log(brain_se.adj^2/(brain_mean^2) + 1)
lg.brain = log(brain_mean) - 0.5 * lg.brain_var.adj
To create data frame at the species level
data.spec = data.frame(row.names = 1:length(body_mean), species = row.names(body_mean),
cbind(body_mean, body_N, body_var, lg.body, lg.body_var.adj, brain_mean,
brain_N, brain_var, lg.brain, lg.brain_var.adj), check.rows = TRUE,
check.names = TRUE)
To import and check phylogeny:
library(ape)
tree = read.tree("primate_tree.phy")
str(tree)
## List of 4
## $ edge : int [1:170, 1:2] 87 88 89 90 91 92 92 93 94 95 ...
## $ Nnode : int 85
## $ tip.label : chr [1:86] "Allenopithecus_nigroviridis" "Cercopithecus_albogularis" "Cercopithecus_mitis" "Cercopithecus_ascanius" ...
## $ edge.length: num [1:170] 26.19 16.81 8.59 6.54 2.98 ...
## - attr(*, "class")= chr "phylo"
## - attr(*, "order")= chr "cladewise"
To plot phylogeny:
plot(tree, cex = 0.6, no.margin = T)
- Arnold C, Matthews LJ, Nunn CL (2010) The 10kTrees website: a new online resource for primate hylogeny. Evol Anthropol 19: 114-118.
- Paradis E, Claude J, Strimmer K (2004) APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20: 289-290.
"ape" (Paradis et al 2004)
"nlme" (Pinheiro et al 2012)
brain size and body size on primates ("primate_spec.txt", a tab separated text file), species-specific data obtained from the exercises above, includes information on within-species sampling (sample sizes, variances and standard errors
phylogeny ("primate_tree.phy", in phylip format) is taken from 10kTrees (Arnold et al 2010) and tailored to the data at hand
data.spec = read.table("primate_spec.txt", sep = "\t", header = TRUE)
w.ols = lm(lg.brain ~ lg.body, data.spec, weights = brain_N)
or
library(nlme, quietly = T)
w.ols = gls(lg.brain ~ lg.body, weights = ~1/brain_N, data.spec, method = "ML")
summary(w.ols)
## Generalized least squares fit by maximum likelihood
## Model: lg.brain ~ lg.body
## Data: data.spec
## AIC BIC logLik
## 173.7 181 -83.83
##
## Variance function:
## Structure: fixed weights
## Formula: ~1/brain_N
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) -2.0730 0.27967 -7.412 0
## lg.body 0.7868 0.03262 24.119 0
##
## Correlation:
## (Intr)
## lg.body -0.985
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.5473 -0.6364 -0.2264 0.3108 3.3693
##
## Residual standard error: 1.429
## Degrees of freedom: 86 total; 84 residual
library(ape)
tree = read.tree("primate_tree.phy")
bm.tree <- corBrownian(phy = tree)
# need species as row names:
row.names(data.spec) = data.spec$species
# pgls without weight:
pgls = gls(lg.brain ~ lg.body, correlation = bm.tree, data.spec)
# pgls with weight:
w.pgls = gls(lg.brain ~ lg.body, weights = ~1/brain_N, correlation = bm.tree,
data.spec)
summary(w.pgls)
## Generalized least squares fit by REML
## Model: lg.brain ~ lg.body
## Data: data.spec
## AIC BIC logLik
## 184.1 191.4 -89.05
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
## Variance function:
## Structure: fixed weights
## Formula: ~1/brain_N
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) -1.3387 0.3839 -3.487 8e-04
## lg.body 0.7095 0.0482 14.734 0e+00
##
## Correlation:
## (Intr)
## lg.body -0.983
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -0.94340 -0.35254 -0.13766 0.05078 1.53642
##
## Residual standard error: 3.727
## Degrees of freedom: 86 total; 84 residual
plot(data.spec$lg.body, data.spec$lg.brain, pch = 1, cex = 0.6 * sqrt(data.spec$brain_N),
xlab = "log(body size)", ylab = "log(brain size)")
abline(w.ols, col = "red")
abline(pgls, col = "blue")
abline(w.pgls, col = "green")
legend(x = "bottomright", legend = c(paste("w.ols, AIC = ", round(AIC(w.ols),
2)), paste("pgls, AIC = ", round(AIC(pgls), 2)), paste("w.pgls, AIC = ",
round(AIC(w.pgls), 2))), lty = 1, col = c("red", "blue", "green"))
The best model is the phylogenetic model that does not consider weights. The weighted model overemphasizes the influence of the difference in sample sizes. Note the high repeatability of both traits, which already indicate that adjusting for unequal sampling is not important.
- Arnold C, Matthews LJ, Nunn CL (2010) The 10kTrees website: a new online resource for primate hylogeny. Evol Anthropol 19: 114-118.
- Paradis E, Claude J, Strimmer K (2004) APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20: 289-290.
- Pinheiro J, Bates D, DebRoy S, Sarkar D, R Development Core Team (2012) nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1-104.
4) Fitting univariate evolutionary models with intraspecific varaince to estimate species-specific traits (means and variances), ancestral states, evolutionary rate and phylogenetic signal
"ape" (Paradis et al 2004)
"phytools" (Revell 2012)
brain size and body size on primates ("primate_ind.txt", a tab separated text file), with more than one observation for the majority of species, individual-specific data
brain size and body size on primates ("primate_spec.txt", a tab separated text file), species-specific data obtained from the exercises above, includes information on within-species sampling (sample sizes, variances and standard errors
phylogeny ("primate_tree.phy", in phylip format) is taken from 10kTrees (Arnold et al 2010) and tailored to the data at hand
Estimating species-specific means and variances, ancestral states and evolutionary rates by using Bayesian MCMC
Model assuming common variance
data.ind = read.table("primate_ind.txt", sep = "\t", header = TRUE)
library(ape)
tree = read.tree("primate_tree.phy")
brain = c(data.ind$brain)
names(brain) = data.ind$species
library(phytools, quietly = T)
fittedB = fitBayes(tree, brain, ngen = 10000, method = "reduced")
## Control parameters (set by user or default):
## List of 8
## $ sig2 : num 1249
## $ a : num 93.7
## $ xbar : Named num [1:86] 57 55.7 69.1 64.1 57.5 ...
## ..- attr(*, "names")= chr [1:86] "Allenopithecus_nigroviridis" "Cercopithecus_albogularis" "Cercopithecus_mitis" "Cercopithecus_ascanius" ...
## $ intV : num 455
## $ pr.mean: num [1:89] 1000 0 0 0 0 0 0 0 0 0 ...
## $ pr.var : num [1:89] 1e+06 1e+03 1e+03 1e+03 1e+03 1e+03 1e+03 1e+03 1e+03 1e+03 ...
## $ prop : num [1:89] 12.5 12.5 912 912 912 ...
## $ sample : num 100
## Starting MCMC...
## Done MCMC.
est.means = colMeans(fittedB[21:101, c(-1, -2, -3, -(dim(fittedB)[2] - 1), -dim(fittedB)[2])])
art.means = tapply(data.ind$brain, data.ind$species, mean)
plot(art.means, est.means[names(art.means)], xlab = "Arithmetic means", ylab = "Bayesian estimates of means")
abline(a = 0, b = 1, col = "red")
legend(x = "bottomright", legend = c(paste("var = ", round(mean(fittedB[21:101,
(dim(fittedB)[2] - 1)]), 2)), paste("MCRA = ", round(mean(fittedB[21:101,
3]), 2)), paste("sigma^2 = ", round(mean(fittedB[21:101, 2]), 2))))
The cicles are species-specific means as can be calculated from the raw individual-specific data by using their arithemtic means or Bayesian MCMC-based estimates that incorporate phylogenetic constraints. Legend on the bottom right shows the assumed common intraspecific variance, trait value at the root of the phylogeny and evolutionary rate.
Model assuming heterogeneous variance
fittedB = fitBayes(tree, brain, ngen = 10000, method = "full")
## Control parameters (set by user or default):
## List of 8
## $ sig2 : num 1249
## $ a : num 93.7
## $ xbar : Named num [1:86] 57 55.7 69.1 64.1 57.5 ...
## ..- attr(*, "names")= chr [1:86] "Allenopithecus_nigroviridis" "Cercopithecus_albogularis" "Cercopithecus_mitis" "Cercopithecus_ascanius" ...
## $ v : Named num [1:86] 455 455 455 455 455 ...
## ..- attr(*, "names")= chr [1:86] "Allenopithecus_nigroviridis" "Cercopithecus_albogularis" "Cercopithecus_mitis" "Cercopithecus_ascanius" ...
## $ pr.mean: num [1:174] 1000 0 0 0 0 0 0 0 0 0 ...
## $ pr.var : num [1:174] 1e+06 1e+03 1e+03 1e+03 1e+03 1e+03 1e+03 1e+03 1e+03 1e+03 ...
## $ prop : Named num [1:174] 12.5 12.5 912 912 912 ...
## ..- attr(*, "names")= chr [1:174] "" "" "" "" ...
## $ sample : num 100
## Starting MCMC...
## Done MCMC.
est.means = colMeans(fittedB[21:101, 4:((dim(fittedB)[2] - 4)/2 + 3)])
plot(art.means, est.means[names(art.means)], xlab = "Arithmetic means", ylab = "Bayesian estimates of means")
abline(a = 0, b = 1, col = "red")
legend(x = "bottomright", legend = c(paste("MCRA = ", round(mean(fittedB[21:101,
3]), 2)), paste("sigma^2 = ", round(mean(fittedB[21:101, 2]), 2))))
The cicles are species-specific means as can be calculated from the raw individual-specific data by using their arithemtic means or Bayesian MCMC-based estimates that incorporate phylogenetic constraints. Legend on the bottom right shows trait value at the root of the phylogeny and evolutionary rate. Within-species variance is assumed to be different across species.
Without implementing measurement errors
Blomberg’s K
data.spec = read.table("primate_spec.txt", sep = "\t", header = TRUE)
library(ape)
tree = read.tree("primate_tree.phy")
phylosig(tree, brain, test = T)
## $K
## [1] 0.3145
##
## $P
## [1] 0.045
Pagel’s lambda
phylosig(tree, brain, method = "lambda", test = T)
## $lambda
## [1] 1.012
##
## $logL
## [1] -534.4
##
## $logL0
## [1] -553.3
##
## $P
## [1] 7.47e-10
By implementing measurement errors
Blomberg’ K
brain_se = c(data.spec$brain_se.adj)
names(brain_se) = data.spec$species
phylosig(tree, brain, test = T, se = brain_se)
## $K
## [1] 0.3237
##
## $P
## [1] 0.045
##
## $sig2
## [1] 1331
##
## $logL
## [1] -536.6
Pagel’s lambda
phylosig(tree, brain, method = "lambda", test = T, se = brain_se)
## $lambda
## [1] 1.012
##
## $sig2
## [1] 1571
##
## $logL
## [1] -531.9
##
## $convergence
## [1] 0
##
## $message
## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
##
## $logL0
## [1] -552.7
##
## $P
## [1] 1.037e-10
- Arnold C, Matthews LJ, Nunn CL (2010) The 10kTrees website: a new online resource for primate hylogeny. Evol Anthropol 19: 114-118.
- Paradis E, Claude J, Strimmer K (2004) APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20: 289-290.
- Revell LJ (2012) Phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol Evol 3: 217-223.
5) Studying correlated evolution in multivariate phylogenetic regression models by incorporating within-species variance
"ape" (Paradis et al 2004)
"nlme" (Pinheiro et al 2012)
"GLSME" (Hansen and Bartoszek 2012)
"MCMCglmm" (Hadfield 2010)
"phytools" (Revell 2012)
"MERegPHYSIG" (Ives et al 2007)
brain size and body size on primates ("primate_ind.txt", a tab separated text file), with more than one observation for the majority of species, individual-specific data
brain size and body size on primates ("primate_spec.txt", a tab separated text file), species-specific data obtained from the exercises above, includes information on within-species sampling (sample sizes, variances and standard errors
phylogeny ("primate_tree.phy", in phylip format) is taken from 10kTrees (Arnold et al 2010) and tailored to the data at hand
When measurement error is ignored
library(ape)
library(nlme)
data.spec = read.table("primate_spec.txt", sep = "\t", header = TRUE)
tree = read.tree("primate_tree.phy")
row.names(data.spec) = data.spec$species
pgls.nlme = gls(lg.brain ~ lg.body, data.spec, correlation = corBrownian(phy = tree))
summary(pgls.nlme)
## Generalized least squares fit by REML
## Model: lg.brain ~ lg.body
## Data: data.spec
## AIC BIC logLik
## 60.48 67.77 -27.24
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 1.218 0.5333 2.284 0.0249
## lg.body 0.287 0.0389 7.375 0.0000
##
## Correlation:
## (Intr)
## lg.body -0.544
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.1662 0.1224 0.7819 1.1608 3.3928
##
## Residual standard error: 0.8308
## Degrees of freedom: 86 total; 84 residual
When considering measurement error (on the predictor)
pglsME.nlme = gls(lg.brain ~ lg.body, data.spec, correlation = corBrownian(phy = tree),
weights = varFixed(~lg.body_var.adj))
summary(pglsME.nlme)
## Generalized least squares fit by REML
## Model: lg.brain ~ lg.body
## Data: data.spec
## AIC BIC logLik
## 181.9 189.2 -87.97
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
## Variance function:
## Structure: fixed weights
## Formula: ~lg.body_var.adj
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) -1.3018 0.3866 -3.367 0.0011
## lg.body 0.7049 0.0485 14.549 0.0000
##
## Correlation:
## (Intr)
## lg.body -0.983
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -0.95067 -0.35625 -0.13744 0.05117 1.55863
##
## Residual standard error: 8.399
## Degrees of freedom: 86 total; 84 residual
When measurement error is ignored
%METHOD = GLS
%
%Data file and ME file saved as 'workingDataFile.txt' and 'workingMEfile.txt'
%
%Coefficients
%b0 (intercept) = 1.2179
%b1 = 0.28701
%
%sigma2 = 0.10725
%
%Independent variable means, variances, and correlations
%aX1 = 7.4524
%s2X1 = 0.83315
%
%LnLikelihood = -25.028
%-2LL = 50.056
%AIC(par=3) = 56.056
When considering measurement error (on both the response and predictor, but without within-species covariance)
%METHOD = EGLS
%
%Data file and ME file saved as 'workingDataFile.txt' and 'workingMEfile.txt'
%
%Coefficients with approximate SE
%b0 (intercept) = 1.1929 +- 0.53329
%b1 = 0.29033 +- 0.038919
%
%sigma2 = 0.12494
%
%Independent variable means, variances, and correlations
%aX1 = 7.4533
%s2X1 = 0.82363
%METHOD = ML
%
%Data file and ME file saved as 'workingDataFile.txt' and 'workingMEfile.txt'
%
%Coefficients with approximate SE
%b0 (intercept) = 1.1659 +- 0.52755
%b1 = 0.29394 +- 0.039164
%
%correlation matrix of estimates of b =
% 1 -0.55338
% -0.55338 1
%
%sigma2 = 0.1046
%
%Independent variable means, variances, and correlations
%aX1 = 7.4534
%s2X1 = 0.81061
%
%LnLikelihood = -138.0037
%-2LL = 276.0074
%AIC(par=5) = 286.0074
%METHOD = REML
%
%Data file and ME file saved as 'workingDataFile.txt' and 'workingMEfile.txt'
%
%Coefficients
%b0 (intercept) = 1.1648
%b1 = 0.29384
%
%sigma2 = 0.10462
%
%Independent variable means, variances, and correlations
%aX1 = 7.4597
%s2X1 = 0.020603
%
%LnLikelihood = -132.3263
%-2LL = 264.6526
%AIC(par=2) = 268.6526
When measurement error is ignored
library(phytools)
data.spec = read.table("primate_spec.txt", sep = "\t", header = TRUE)
tree = read.tree("primate_tree.phy")
pgls.Ives(tree = tree, X = setNames(data.spec$lg.body, data.spec$species), y = setNames(data.spec$lg.brain,
data.spec$species), Vx = setNames(rep(0, length(tree$tip)), data.spec$species),
Vy = setNames(rep(0, length(tree$tip)), data.spec$species), Cxy = setNames(rep(0,
length(tree$tip)), data.spec$species))
## $beta
## [1] 0.6108 0.3123
##
## $sig2x
## [1] 0.108
##
## $sig2y
## [1] 0.009439
##
## $a
## [1] -0.5942 0.4252
##
## $logL
## [1] -156.5
##
## $convergence
## [1] 0
##
## $message
## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
The convergence is often sensitive to starting conditions. It may be more straightforward to repeat the above procedure several times (10-100) and select the results with the highest likelihood:
res = c()
for (k in 1:100) {
mod = try(pgls.Ives(tree = tree, X = setNames(data.spec$lg.body, data.spec$species),
y = setNames(data.spec$lg.brain, data.spec$species), Vx = setNames(rep(0,
length(tree$tip)), data.spec$species), Vy = setNames(rep(0, length(tree$tip)),
data.spec$species), Cxy = setNames(rep(0, length(tree$tip)), data.spec$species)))
if (class(mod) != "try-error") {
int = mod$beta[1]
slope = mod$beta[2]
lik = mod$logL
anc_a = mod$a[1]
anc_b = mod$a[2]
conv = mod$convergence
}
res = rbind(res, data.frame(k, int, slope, anc_a, anc_b, lik, conv))
}
res[which(res[, 6] == max(subset(res, conv == 0)[, 6])), ]
## k int slope anc_a anc_b lik conv
## 56 56 1.215 0.2872 7.434 3.35 -138.7 0
When considering measurement error (on both the response and predictor, but without within-species covariance)
res = c()
for (k in 1:100) {
mod = try(pgls.Ives(tree = tree, X = setNames(data.spec$lg.body, data.spec$species),
y = setNames(data.spec$lg.brain, data.spec$species), Vx = setNames(data.spec$lg.body_var.adj,
data.spec$species), Vy = setNames(data.spec$lg.brain_var.adj, data.spec$species),
Cxy = setNames(rep(0, length(tree$tip)), data.spec$species)))
if (class(mod) != "try-error") {
int = mod$beta[1]
slope = mod$beta[2]
lik = mod$logL
anc_a = mod$a[1]
anc_b = mod$a[2]
conv = mod$convergence
}
res = rbind(res, data.frame(k, int, slope, anc_a, anc_b, lik, conv))
}
res[which(res[, 6] == max(subset(res, conv == 0)[, 6])), ]
## k int slope anc_a anc_b lik conv
## 59 59 -0.2379 0.4826 7.46 3.362 -125.7 0
When both measurement error and phylogeny are ignored (GLS)
# functions are stored in a source file (available from K. Bartoszek's
# website)
source("GLSME.R")
data.spec = read.table("primate_spec.txt", sep = "\t", header = TRUE)
row.names(data.spec) = data.spec$species
tree = read.tree("primate_tree.phy")
gls.glsme = GLSME(y = data.spec$lg.brain, D = cbind(rep(1, length(data.spec$lg.body)),
data.spec$lg.body), Vt = 1, Ve = 0, Vd = list("F", "F"), Vu = 0)
gls.glsme
## $GLSestimate
## [,1]
## [1,] -1.6429
## [2,] 0.7128
##
## $errorGLSestim
## [,1]
## [1,] 0.32506
## [2,] 0.04057
##
## $BiasCorrectedGLSestimate
## [,1]
## [1,] -1.6429
## [2,] 0.7128
##
## $errorBiasCorrectedGLSestim
## [,1]
## [1,] 0.32506
## [2,] 0.04057
##
## $K
## [,1] [,2]
## [1,] 1 0
## [2,] 0 1
##
## $R2
## [,1]
## [1,] 0.7821
##
## $R2BiasCorrectedModel
## [,1]
## [1,] 0.7821
##
## $LogLikelihood
## [1] -63.45
##
## $LogLikelihoodBiasCorrectedModel
## [1] -63.45
##
## $PredictorVarianceConstantEstimate
## [1] 0 0
##
## $ResponseVarianceConstantEstimate
## [1] 0.2561
When measurement error is ignored but phylogeny is accounted for (PGLS)
# the order of species in the data should be the same as the order of
# species in the VCV matrix, thus the dataset should be resorted as
# data.spec[tree$tip.label,]!
pgls.glsme = GLSME(y = data.spec[tree$tip.label, ]$lg.brain, D = cbind(rep(1,
length(data.spec$lg.body)), data.spec[tree$tip.label, ]$lg.body), Vt = vcv(tree),
Ve = 0, Vd = list("F", "F"), Vu = 0)
pgls.glsme
## $GLSestimate
## [,1]
## [1,] 1.218
## [2,] 0.287
##
## $errorGLSestim
## [,1]
## [1,] 0.52707
## [2,] 0.03846
##
## $BiasCorrectedGLSestimate
## [,1]
## [1,] 1.218
## [2,] 0.287
##
## $errorBiasCorrectedGLSestim
## [,1]
## [1,] 0.52707
## [2,] 0.03846
##
## $K
## [,1] [,2]
## [1,] 1 0
## [2,] 0 1
##
## $R2
## [,1]
## [1,] 0.393
##
## $R2BiasCorrectedModel
## [,1]
## [1,] 0.393
##
## $LogLikelihood
## [1] -25.02
##
## $LogLikelihoodBiasCorrectedModel
## [1] -25.02
##
## $PredictorVarianceConstantEstimate
## [1] 0 0
##
## $ResponseVarianceConstantEstimate
## [1] 0.009235
When both measurement error and phylogeny are accounted for (considering measurement error (on both the response and predictor, but without within-species covariance)
# again, the order of species in the data should be the same as the order of
# species in the VCV matrix, thus the dataset should be resorted as
# data.spec[tree$tip.label,]! in addition, note that when there is a
# measurement error in the predictors, these are treated as 'random effects'
# (as specified by the non zero entry in the Vd matrix), and they will be
# centered on their means by default (it can be switched off by setting
# CenterPredictor=FALSE)
pglsME.glsme = GLSME(y = data.spec[tree$tip.label, ]$lg.brain, D = cbind(rep(1,
length(data.spec$lg.body)), data.spec[tree$tip.label, ]$lg.body), Vt = vcv(tree),
Ve = diag(data.spec[tree$tip.label, ]$lg.brain_var.adj), Vd = list("F",
vcv(tree)), Vu = list("F", diag(data.spec[tree$tip.label, ]$lg.body_var.adj)))
pglsME.glsme
## $GLSestimate
## [,1]
## [1,] 3.3610
## [2,] 0.3209
##
## $errorGLSestim
## [,1]
## [1,] 0.48730
## [2,] 0.04689
##
## $BiasCorrectedGLSestimate
## [,1]
## [1,] 3.3582
## [2,] 0.4173
##
## $errorBiasCorrectedGLSestim
## [,1]
## [1,] 0.48730
## [2,] 0.06111
##
## $K
## [,1] [,2]
## [1,] 1 0.006861
## [2,] 0 0.769192
##
## $R2
## [,1]
## [1,] 0.4894
##
## $R2BiasCorrectedModel
## [,1]
## [1,] 0.5328
##
## $LogLikelihood
## [1] -23.66
##
## $LogLikelihoodBiasCorrectedModel
## [1] -21.25
##
## $PredictorVarianceConstantEstimate
## [1] 0.00000 0.04932
##
## $ResponseVarianceConstantEstimate
## [1] 0.0112
# as a consequence of centering, the estimated intercept corresponds to
# b0+b1*mean(data.spec$lg.body) from model data.spec$lg.brain =
# b0+b1*mean(data.spec$lg.body) + b1*(data.spec$lg.body-
# mean(data.spec$lg.body)). Therefore, the intercept of interests can be
# obtained as:
pglsME.glsme$BiasCorrectedGLSestimate[1] - pglsME.glsme$BiasCorrectedGLSestimate[2] *
mean(data.spec$lg.body)
## [1] 0.06225
# from which, the slope of interest is:
pglsME.glsme$BiasCorrectedGLSestimate[2]
## [1] 0.4173
# taking into account estimation error after the bias correction, the slope
# falls within the error range of:
slope_range = rbind(pglsME.glsme$BiasCorrectedGLSestimate[2] - pglsME.glsme$errorBiasCorrectedGLSestim[2],
pglsME.glsme$BiasCorrectedGLSestimate[2] + pglsME.glsme$errorBiasCorrectedGLSestim[2])
slope_range
## [,1]
## [1,] 0.3561
## [2,] 0.4784
# the standard error of the intercept can be computed based on the variance
# of the noncentered intercept estimate (T. Hansen and K. Bartoszek pers.
# comm.):
int_var = pglsME.glsme$errorBiasCorrectedGLSestim[1]^2 + pglsME.glsme$errorBiasCorrectedGLSestim[2]^2 *
mean(data.spec$lg.body)^2
sqrt(int_var)
## [1] 0.6859
# from this the error range around the non-centred intercept can be given:
intercept_range = rbind(pglsME.glsme$BiasCorrectedGLSestimate[1] - pglsME.glsme$BiasCorrectedGLSestimate[2] *
mean(data.spec$lg.body) - sqrt(int_var), pglsME.glsme$BiasCorrectedGLSestimate[1] -
pglsME.glsme$BiasCorrectedGLSestimate[2] * mean(data.spec$lg.body) + sqrt(int_var))
intercept_range
## [,1]
## [1,] -0.6237
## [2,] 0.7481
To prepare the dataset, we need to generate a column named ‘animal’
library(MCMCglmm)
data.spec = read.table("primate_spec.txt", sep = "\t", header = TRUE)
data.ind = read.table("primate_ind.txt", sep = "\t", header = TRUE)
tree = read.tree("primate_tree.phy")
data.spec$animal = data.spec$species
data.ind$animal = data.ind$species
When measurement error is ignored, and species-specific data are used (equivalent with PGLS)
prior <- list(R = list(V = 1, nu = 0.002), G = list(G1 = list(V = 1, nu = 0.002)))
mcmc = MCMCglmm(lg.brain ~ lg.body, random = ~animal, data = data.spec, pedigree = tree,
prior = prior, verbose = F, family = "gaussian", nodes = "TIPS")
summary(mcmc)
##
## Iterations = 3001:12991
## Thinning interval = 10
## Sample size = 1000
##
## DIC: -238.3
##
## G-structure: ~animal
##
## post.mean l-95% CI u-95% CI eff.samp
## animal 0.69 0.47 0.904 1000
##
## R-structure: ~units
##
## post.mean l-95% CI u-95% CI eff.samp
## units 0.00327 0.000199 0.00967 226
##
## Location effects: lg.brain ~ lg.body
##
## post.mean l-95% CI u-95% CI eff.samp pMCMC
## (Intercept) 1.0990 0.0235 2.0715 1000 0.024 *
## lg.body 0.2996 0.2282 0.3870 1000 <0.001 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
When measurement error is considered on the predictor variable, and individual-specific data are used (phylogenetic mixed model)
# we need within-subject centering to do this (van de Pol and Wright 2009),
# the approprate data transformation:
data.ind$lg.body = log(data.ind$body)
data.ind$lg.brain = log(data.ind$brain)
means = tapply(X = data.ind$lg.body, INDEX = data.ind$species, FUN = mean)
N = tapply(X = data.ind$lg.body, INDEX = data.ind$species, FUN = length)
mean.vector = vector()
for (k in 1:length(means)) {
mean.set = rep(means[k], N[k])
mean.vector = rbind(mean.vector, as.data.frame(mean.set))
}
data.ind$body_av = mean.vector[, 1]
data.ind$body_centr <- data.ind$lg.body - data.ind$body_av
# the model based on individual-specific data
prior2 = list(G = list(G1 = list(V = 1, nu = 0.02), G2 = list(V = 1, nu = 0.02)),
R = list(V = 1, nu = 0.02))
mcmc.ind.centr = MCMCglmm(lg.brain ~ body_centr + body_av, random = ~animal +
species, data = data.ind, pedigree = tree, prior = prior2, verbose = F,
family = "gaussian", nodes = "TIPS")
summary(mcmc.ind.centr)
##
## Iterations = 3001:12991
## Thinning interval = 10
## Sample size = 1000
##
## DIC: -1236
##
## G-structure: ~animal
##
## post.mean l-95% CI u-95% CI eff.samp
## animal 0.649 0.44 0.899 1000
##
## ~species
##
## post.mean l-95% CI u-95% CI eff.samp
## species 0.00642 0.0017 0.0134 469
##
## R-structure: ~units
##
## post.mean l-95% CI u-95% CI eff.samp
## units 0.0134 0.012 0.0147 911
##
## Location effects: lg.brain ~ body_centr + body_av
##
## post.mean l-95% CI u-95% CI eff.samp pMCMC
## (Intercept) 1.445 0.542 2.492 1000 0.002 **
## body_centr 0.129 0.114 0.144 1000 <0.001 ***
## body_av 0.258 0.184 0.318 1000 <0.001 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The slope for ‘body_av’ that gives the interspecific slope that is controlled for phylogeny and within-species variance in body mass (predictor).
See more exercises with "MCMCglmm" in the OPM for chapter 11
- Arnold C, Matthews LJ, Nunn CL (2010) The 10kTrees website: a new online resource for primate hylogeny. Evol Anthropol 19: 114-118.
- Hadfield JD (2010) MCMC methods for multi-response generalized linear mixed models: the MCMCglmm R package. Journal of Statistical Software 33: 1-22.
- Hansen TF, Bartoszek K (2012) Interpreting the evolutionary regression: the interplay between observational and biological errors in phylogenetic comparative studies. Syst Biol 61: 413-425.
- Ives AR, Midford PE, Garland T (2007) Within-species variation and measurement error in phylogenetic comparative methods. Syst Biol 56: 252-270.
- Paradis E, Claude J, Strimmer K (2004) APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20: 289-290.
- Pinheiro J, Bates D, DebRoy S, Sarkar D, R Development Core Team (2012) nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1-104.
- Revell LJ (2012) Phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol Evol 3: 217-223.
- van de Pol MV, Wright J (2009) A simple method for distinguishing within- versus between-subject effects using mixed models. Anim Behav 77: 753-758.