07. Chapter 7

Chapter 7 – Uncertainties Due to Within-Species Variation in Comparative Studies: Measurement Errors and Statistical Weights

Data

brain size and body size on primates ("primate_ind.txt", a tab separated text file), with more than one observation for the majority of species, individual-specific data [download] - right click, Save as…

brain size and body size on primates ("primate_spec.txt", a tab separated text file), species-specific data obtained from the exercises above, includes information on within-species sampling (sample sizes, variances and standard errors [download] - right click, Save as…

phylogeny ("primate_tree.phy", in phylip format) is taken from 10kTrees (Arnold et al 2010) and tailored to the data at hand [download] - right click, Save as…

Function

GLSME function ("GLSME.R", source file), GLSME function by K. Bartoszek [download] - right click, Save as…

1) Repeatability

Sources

R packages

"lme4" (Bates et al 2011)

"rptR" (Nakagawa and Schielzeth 2010)

"MCMCglmm" (Hadfield 2010)

Data

brain size and body size on primates ("primate_ind.txt", a tab separated text file), with more than one observation for the majority of species, individual-specific data

Codes

Data input

Import, transform and check individual specific data:

data.ind = read.table("primate_ind.txt", sep = "\t", header = TRUE)
data.ind$lg_body = log(data.ind$body)
data.ind$lg_brain = log(data.ind$brain)
str(data.ind)

## 'data.frame':	894 obs. of  5 variables:
##  $ species : Factor w/ 86 levels "Allenopithecus_nigroviridis",..: 1 2 3 3 3 3 3 3 3 3 ...
##  $ body    : num  2920 826 777 2310 4309 ...
##  $ brain   : num  57 45 42.5 52 54.2 ...
##  $ lg_body : num  7.98 6.72 6.66 7.75 8.37 ...
##  $ lg_brain: num  4.04 3.81 3.75 3.95 3.99 ...

rptR

May not compatible with the most recent version of R (need earlier releases to run the scripts below)

library(rptR, quietly = T)
rpt(data.ind$lg_brain, data.ind$species, datatype = "Gaussian", method = "REML")

## 
## Repeatability calculation using the LMM.REML method
## 
## R  = 0.985
## SE = 0.002
## CI = [0.979, 0.989]
## P  = 0 []
##      0.001 []

rpt(data.ind$lg_body, data.ind$species, datatype = "Gaussian", method = "ANOVA")

## 
## Repeatability calculation using the ANOVA method
## 
## R  = 0.884
## SE = 0.017
## CI = [0.851, 0.917]
## P  = 0 []
##      0.001 []

lme4

To estimate repeatability directly from a mixed model

library(lme4, quietly = T)
mod = lmer(lg_brain ~ 1 + (1 | species), data.ind)
attr(lme4::VarCorr(mod)$species, "stddev")^2/(attr(lme4::VarCorr(mod)$species, 
    "stddev")^2 + attr(lme4::VarCorr(mod), "sc")^2)

## (Intercept) 
##      0.9854

To obtain P value based on randomization

null = c()
for (i in 1:1000) {
    species_random = sample(data.ind$species, length(data.ind$species))
    null[i] = anova(lmer(lg_brain ~ 1 + (1 | species_random), data.ind), mod, 
        test = "Chisq")$Pr[2]
}
sum(null > 0.05)/length(null)

## [1] 0

To get 95% confidence intervals based on parametric bootstrap

rvalues <- numeric()
for (i in 1:1000) {
    y = simulate(mod)
    mboot <- lmer(y[, 1] ~ 1 + (1 | species), data.ind)
    rvalues[i] = attr(lme4::VarCorr(mboot)$species, "stddev")^2/(attr(lme4::VarCorr(mboot)$species, 
        "stddev")^2 + attr(lme4::VarCorr(mboot), "sc")^2)
}
quantile(rvalues, c(0.025, 0.975))

##   2.5%  97.5% 
## 0.9798 0.9891

MCMCglmm

library(MCMCglmm, quietly = T)
prior <- list(R = list(V = 1, nu = 0.002), G = list(G1 = list(V = 1, nu = 0.002)))
mod.mcmc = MCMCglmm(lg_body ~ 1, random = ~species, data = data.ind, prior = prior, 
    verbose = F)
Rvalue = mod.mcmc$VCV[, 1]/(mod.mcmc$VCV[, 1] + mod.mcmc$VCV[, 2])
mean(Rvalue)

## [1] 0.8513

quantile(Rvalue, c(0.025, 0.975))

##   2.5%  97.5% 
## 0.8086 0.8896

References

Bates D, Maechler M, Bolker B (2011) lme4: Linear mixed-effects models using S4 classes. R package version 0.999375-42/r1414.
Hadfield JD (2010) MCMC methods for multi-response generalized linear mixed models: the MCMCglmm R package. Journal of Statistical Software 33: 1-22
Nakagawa S, Schielzeth H (2010) Repeatability for Gaussian and non-Gaussian data: a practical guide for biologists. Biological Reviews 85: 935-956

2) Data Preparation

Sources

R packages

"ape" (Paradis et al 2004)

Data

brain size and body size on primates ("primate_ind.txt", a tab separated text file), with more than one observation for the majority of species, individual-specific data

phylogeny ("primate_tree.phy", in phylip format) is taken from 10kTrees (Arnold et al 2010) and tailored to the data at hand

Codes

To obtain species-specific data (means and variances) from individual-specific data

data.ind = read.table("primate_ind.txt", sep = "\t", header = TRUE)
body_mean = tapply(data.ind$body, data.ind$species, mean)
brain_mean = tapply(data.ind$brain, data.ind$species, mean)
body_N = tapply(data.ind$body, data.ind$species, length)
brain_N = tapply(data.ind$brain, data.ind$species, length)
body_var = tapply(data.ind$body, data.ind$species, var)
brain_var = tapply(data.ind$brain, data.ind$species, var)
se <- function(x) sqrt(var(x)/length(x))
body_se <- tapply(data.ind$body, data.ind$species, se)
brain_se <- tapply(data.ind$brain, data.ind$species, se)

To calculate measurement errors based on pooled CV

body_cv = sqrt(body_var)/body_mean
body_cv.pool = sum(body_cv * body_N, na.rm = T)/sum(body_N, na.rm = T)
body_se.adj = (body_cv.pool * body_mean)/sqrt(body_N)
body_var.adj = (body_cv.pool * body_mean)^2
brain_cv = sqrt(brain_var)/brain_mean
brain_cv.pool = sum(brain_cv * brain_N, na.rm = T)/sum(brain_N, na.rm = T)
brain_se.adj = (brain_cv.pool * brain_mean)/sqrt(brain_N)
brain_var.adj = (brain_cv.pool * brain_mean)^2

To log-transform data

lg.body_var.adj = log(body_var.adj/(body_mean^2) + 1)
# note that this gives the same value for all species use the following
# instead:
lg.body_var.adj = log(body_se.adj^2/(body_mean^2) + 1)
lg.body = log(body_mean) - 0.5 * lg.body_var.adj
lg.brain_var.adj = log(brain_se.adj^2/(brain_mean^2) + 1)
lg.brain = log(brain_mean) - 0.5 * lg.brain_var.adj

To create data frame at the species level

data.spec = data.frame(row.names = 1:length(body_mean), species = row.names(body_mean),
    cbind(body_mean, body_N, body_var, lg.body, lg.body_var.adj, brain_mean,
        brain_N, brain_var, lg.brain, lg.brain_var.adj), check.rows = TRUE,
    check.names = TRUE)

To import and check phylogeny:

library(ape)
tree = read.tree("primate_tree.phy")
str(tree)

## List of 4
##  $ edge       : int [1:170, 1:2] 87 88 89 90 91 92 92 93 94 95 ...
##  $ Nnode      : int 85
##  $ tip.label  : chr [1:86] "Allenopithecus_nigroviridis" "Cercopithecus_albogularis" "Cercopithecus_mitis" "Cercopithecus_ascanius" ...
##  $ edge.length: num [1:170] 26.19 16.81 8.59 6.54 2.98 ...
##  - attr(*, "class")= chr "phylo"
##  - attr(*, "order")= chr "cladewise"

To plot phylogeny:

plot(tree, cex = 0.6, no.margin = T)

References

Arnold C, Matthews LJ, Nunn CL (2010) The 10kTrees website: a new online resource for primate hylogeny. Evol Anthropol 19: 114-118.
Paradis E, Claude J, Strimmer K (2004) APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20: 289-290.

3) Weighted regression

Sources

R packages

"ape" (Paradis et al 2004)

"nlme" (Pinheiro et al 2012)

Data

brain size and body size on primates ("primate_spec.txt", a tab separated text file), species-specific data obtained from the exercises above, includes information on within-species sampling (sample sizes, variances and standard errors

phylogeny ("primate_tree.phy", in phylip format) is taken from 10kTrees (Arnold et al 2010) and tailored to the data at hand

Codes

Ordinary least-square models without accounting for phylogeny

data.spec = read.table("primate_spec.txt", sep = "\t", header = TRUE)
w.ols = lm(lg.brain ~ lg.body, data.spec, weights = brain_N)

or

library(nlme, quietly = T)
w.ols = gls(lg.brain ~ lg.body, weights = ~1/brain_N, data.spec, method = "ML")
summary(w.ols)

## Generalized least squares fit by maximum likelihood
##   Model: lg.brain ~ lg.body 
##   Data: data.spec 
##     AIC BIC logLik
##   173.7 181 -83.83
## 
## Variance function:
##  Structure: fixed weights
##  Formula: ~1/brain_N 
## 
## Coefficients:
##               Value Std.Error t-value p-value
## (Intercept) -2.0730   0.27967  -7.412       0
## lg.body      0.7868   0.03262  24.119       0
## 
##  Correlation: 
##         (Intr)
## lg.body -0.985
## 
## Standardized residuals:
##     Min      Q1     Med      Q3     Max 
## -2.5473 -0.6364 -0.2264  0.3108  3.3693 
## 
## Residual standard error: 1.429 
## Degrees of freedom: 86 total; 84 residual

Phylogenetic least-square models with weights

library(ape)
tree = read.tree("primate_tree.phy")
bm.tree <- corBrownian(phy = tree)
# need species as row names:
row.names(data.spec) = data.spec$species
# pgls without weight:
pgls = gls(lg.brain ~ lg.body, correlation = bm.tree, data.spec)
# pgls with weight:
w.pgls = gls(lg.brain ~ lg.body, weights = ~1/brain_N, correlation = bm.tree,
    data.spec)
summary(w.pgls)

## Generalized least squares fit by REML
##   Model: lg.brain ~ lg.body 
##   Data: data.spec 
##     AIC   BIC logLik
##   184.1 191.4 -89.05
## 
## Correlation Structure: corBrownian
##  Formula: ~1 
##  Parameter estimate(s):
## numeric(0)
## Variance function:
##  Structure: fixed weights
##  Formula: ~1/brain_N 
## 
## Coefficients:
##               Value Std.Error t-value p-value
## (Intercept) -1.3387    0.3839  -3.487   8e-04
## lg.body      0.7095    0.0482  14.734   0e+00
## 
##  Correlation: 
##         (Intr)
## lg.body -0.983
## 
## Standardized residuals:
##      Min       Q1      Med       Q3      Max 
## -0.94340 -0.35254 -0.13766  0.05078  1.53642 
## 
## Residual standard error: 3.727 
## Degrees of freedom: 86 total; 84 residual

Plotting the results

plot(data.spec$lg.body, data.spec$lg.brain, pch = 1, cex = 0.6 * sqrt(data.spec$brain_N),
    xlab = "log(body size)", ylab = "log(brain size)")
abline(w.ols, col = "red")
abline(pgls, col = "blue")
abline(w.pgls, col = "green")
legend(x = "bottomright", legend = c(paste("w.ols, AIC = ", round(AIC(w.ols),
    2)), paste("pgls, AIC = ", round(AIC(pgls), 2)), paste("w.pgls, AIC = ",
    round(AIC(w.pgls), 2))), lty = 1, col = c("red", "blue", "green"))

The best model is the phylogenetic model that does not consider weights. The weighted model overemphasizes the influence of the difference in sample sizes. Note the high repeatability of both traits, which already indicate that adjusting for unequal sampling is not important.

References

Arnold C, Matthews LJ, Nunn CL (2010) The 10kTrees website: a new online resource for primate hylogeny. Evol Anthropol 19: 114-118.
Paradis E, Claude J, Strimmer K (2004) APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20: 289-290.
Pinheiro J, Bates D, DebRoy S, Sarkar D, R Development Core Team (2012) nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1-104.

4) Fitting univariate evolutionary models with intraspecific varaince to estimate species-specific traits (means and variances), ancestral states, evolutionary rate and phylogenetic signal

Sources

R packages

"ape" (Paradis et al 2004)

"phytools" (Revell 2012)

Data

brain size and body size on primates ("primate_ind.txt", a tab separated text file), with more than one observation for the majority of species, individual-specific data

brain size and body size on primates ("primate_spec.txt", a tab separated text file), species-specific data obtained from the exercises above, includes information on within-species sampling (sample sizes, variances and standard errors

phylogeny ("primate_tree.phy", in phylip format) is taken from 10kTrees (Arnold et al 2010) and tailored to the data at hand

Codes

Estimating species-specific means and variances, ancestral states and evolutionary rates by using Bayesian MCMC

Model assuming common variance

data.ind = read.table("primate_ind.txt", sep = "\t", header = TRUE)
library(ape)
tree = read.tree("primate_tree.phy")
brain = c(data.ind$brain)
names(brain) = data.ind$species
library(phytools, quietly = T)
fittedB = fitBayes(tree, brain, ngen = 10000, method = "reduced")

## Control parameters (set by user or default):

## List of 8
##  $ sig2   : num 1249
##  $ a      : num 93.7
##  $ xbar   : Named num [1:86] 57 55.7 69.1 64.1 57.5 ...
##   ..- attr(*, "names")= chr [1:86] "Allenopithecus_nigroviridis" "Cercopithecus_albogularis" "Cercopithecus_mitis" "Cercopithecus_ascanius" ...
##  $ intV   : num 455
##  $ pr.mean: num [1:89] 1000 0 0 0 0 0 0 0 0 0 ...
##  $ pr.var : num [1:89] 1e+06 1e+03 1e+03 1e+03 1e+03 1e+03 1e+03 1e+03 1e+03 1e+03 ...
##  $ prop   : num [1:89] 12.5 12.5 912 912 912 ...
##  $ sample : num 100

## Starting MCMC...
## Done MCMC.

est.means = colMeans(fittedB[21:101, c(-1, -2, -3, -(dim(fittedB)[2] - 1), -dim(fittedB)[2])])
art.means = tapply(data.ind$brain, data.ind$species, mean)
plot(art.means, est.means[names(art.means)], xlab = "Arithmetic means", ylab = "Bayesian estimates of means")
abline(a = 0, b = 1, col = "red")
legend(x = "bottomright", legend = c(paste("var = ", round(mean(fittedB[21:101,
    (dim(fittedB)[2] - 1)]), 2)), paste("MCRA = ", round(mean(fittedB[21:101,
    3]), 2)), paste("sigma^2 = ", round(mean(fittedB[21:101, 2]), 2))))

The cicles are species-specific means as can be calculated from the raw individual-specific data by using their arithemtic means or Bayesian MCMC-based estimates that incorporate phylogenetic constraints. Legend on the bottom right shows the assumed common intraspecific variance, trait value at the root of the phylogeny and evolutionary rate.

Model assuming heterogeneous variance

fittedB = fitBayes(tree, brain, ngen = 10000, method = "full")

## Control parameters (set by user or default):

## List of 8
##  $ sig2   : num 1249
##  $ a      : num 93.7
##  $ xbar   : Named num [1:86] 57 55.7 69.1 64.1 57.5 ...
##   ..- attr(*, "names")= chr [1:86] "Allenopithecus_nigroviridis" "Cercopithecus_albogularis" "Cercopithecus_mitis" "Cercopithecus_ascanius" ...
##  $ v      : Named num [1:86] 455 455 455 455 455 ...
##   ..- attr(*, "names")= chr [1:86] "Allenopithecus_nigroviridis" "Cercopithecus_albogularis" "Cercopithecus_mitis" "Cercopithecus_ascanius" ...
##  $ pr.mean: num [1:174] 1000 0 0 0 0 0 0 0 0 0 ...
##  $ pr.var : num [1:174] 1e+06 1e+03 1e+03 1e+03 1e+03 1e+03 1e+03 1e+03 1e+03 1e+03 ...
##  $ prop   : Named num [1:174] 12.5 12.5 912 912 912 ...
##   ..- attr(*, "names")= chr [1:174] "" "" "" "" ...
##  $ sample : num 100

## Starting MCMC...
## Done MCMC.

est.means = colMeans(fittedB[21:101, 4:((dim(fittedB)[2] - 4)/2 + 3)])
plot(art.means, est.means[names(art.means)], xlab = "Arithmetic means", ylab = "Bayesian estimates of means")
abline(a = 0, b = 1, col = "red")
legend(x = "bottomright", legend = c(paste("MCRA = ", round(mean(fittedB[21:101,
    3]), 2)), paste("sigma^2 = ", round(mean(fittedB[21:101, 2]), 2))))

The cicles are species-specific means as can be calculated from the raw individual-specific data by using their arithemtic means or Bayesian MCMC-based estimates that incorporate phylogenetic constraints. Legend on the bottom right shows trait value at the root of the phylogeny and evolutionary rate. Within-species variance is assumed to be different across species.

Estimating phylogenetic signal

Without implementing measurement errors

Blomberg’s K

data.spec = read.table("primate_spec.txt", sep = "\t", header = TRUE)
library(ape)
tree = read.tree("primate_tree.phy")
phylosig(tree, brain, test = T)

## $K
## [1] 0.3145
## 
## $P
## [1] 0.045

Pagel’s lambda

phylosig(tree, brain, method = "lambda", test = T)

## $lambda
## [1] 1.012
## 
## $logL
## [1] -534.4
## 
## $logL0
## [1] -553.3
## 
## $P
## [1] 7.47e-10

By implementing measurement errors

Blomberg’ K

brain_se = c(data.spec$brain_se.adj)
names(brain_se) = data.spec$species
phylosig(tree, brain, test = T, se = brain_se)

## $K
## [1] 0.3237
## 
## $P
## [1] 0.045
## 
## $sig2
## [1] 1331
## 
## $logL
## [1] -536.6

Pagel’s lambda

phylosig(tree, brain, method = "lambda", test = T, se = brain_se)

## $lambda
## [1] 1.012
## 
## $sig2
## [1] 1571
## 
## $logL
## [1] -531.9
## 
## $convergence
## [1] 0
## 
## $message
## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
## 
## $logL0
## [1] -552.7
## 
## $P
## [1] 1.037e-10

References

Arnold C, Matthews LJ, Nunn CL (2010) The 10kTrees website: a new online resource for primate hylogeny. Evol Anthropol 19: 114-118.
Paradis E, Claude J, Strimmer K (2004) APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20: 289-290.
Revell LJ (2012) Phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol Evol 3: 217-223.

5) Studying correlated evolution in multivariate phylogenetic regression models by incorporating within-species variance

Sources

R packages

"ape" (Paradis et al 2004)

"nlme" (Pinheiro et al 2012)

"GLSME" (Hansen and Bartoszek 2012)

"MCMCglmm" (Hadfield 2010)

"phytools" (Revell 2012)

Matlab packages

"MERegPHYSIG" (Ives et al 2007)

Data

brain size and body size on primates ("primate_ind.txt", a tab separated text file), with more than one observation for the majority of species, individual-specific data

brain size and body size on primates ("primate_spec.txt", a tab separated text file), species-specific data obtained from the exercises above, includes information on within-species sampling (sample sizes, variances and standard errors

phylogeny ("primate_tree.phy", in phylip format) is taken from 10kTrees (Arnold et al 2010) and tailored to the data at hand

Codes

ape & nlme (PGLS)

When measurement error is ignored

library(ape)
library(nlme)
data.spec = read.table("primate_spec.txt", sep = "\t", header = TRUE)
tree = read.tree("primate_tree.phy")
row.names(data.spec) = data.spec$species
pgls.nlme = gls(lg.brain ~ lg.body, data.spec, correlation = corBrownian(phy = tree))
summary(pgls.nlme)

## Generalized least squares fit by REML
##   Model: lg.brain ~ lg.body 
##   Data: data.spec 
##     AIC   BIC logLik
##   60.48 67.77 -27.24
## 
## Correlation Structure: corBrownian
##  Formula: ~1 
##  Parameter estimate(s):
## numeric(0)
## 
## Coefficients:
##             Value Std.Error t-value p-value
## (Intercept) 1.218    0.5333   2.284  0.0249
## lg.body     0.287    0.0389   7.375  0.0000
## 
##  Correlation: 
##         (Intr)
## lg.body -0.544
## 
## Standardized residuals:
##     Min      Q1     Med      Q3     Max 
## -2.1662  0.1224  0.7819  1.1608  3.3928 
## 
## Residual standard error: 0.8308 
## Degrees of freedom: 86 total; 84 residual

When considering measurement error (on the predictor)

pglsME.nlme = gls(lg.brain ~ lg.body, data.spec, correlation = corBrownian(phy = tree),
    weights = varFixed(~lg.body_var.adj))
summary(pglsME.nlme)

## Generalized least squares fit by REML
##   Model: lg.brain ~ lg.body 
##   Data: data.spec 
##     AIC   BIC logLik
##   181.9 189.2 -87.97
## 
## Correlation Structure: corBrownian
##  Formula: ~1 
##  Parameter estimate(s):
## numeric(0)
## Variance function:
##  Structure: fixed weights
##  Formula: ~lg.body_var.adj 
## 
## Coefficients:
##               Value Std.Error t-value p-value
## (Intercept) -1.3018    0.3866  -3.367  0.0011
## lg.body      0.7049    0.0485  14.549  0.0000
## 
##  Correlation: 
##         (Intr)
## lg.body -0.983
## 
## Standardized residuals:
##      Min       Q1      Med       Q3      Max 
## -0.95067 -0.35625 -0.13744  0.05117  1.55863 
## 
## Residual standard error: 8.399 
## Degrees of freedom: 86 total; 84 residual

MERegPHYSIG (outputs from the program)

When measurement error is ignored


%METHOD = GLS
%
%Data file and ME file saved as 'workingDataFile.txt' and 'workingMEfile.txt'
%
%Coefficients
%b0 (intercept) = 1.2179
%b1 = 0.28701
%
%sigma2 = 0.10725
%
%Independent variable means, variances, and correlations
%aX1 = 7.4524
%s2X1 = 0.83315
%
%LnLikelihood = -25.028
%-2LL = 50.056
%AIC(par=3) = 56.056

When considering measurement error (on both the response and predictor, but without within-species covariance)


%METHOD = EGLS
%
%Data file and ME file saved as 'workingDataFile.txt' and 'workingMEfile.txt'
%
%Coefficients with approximate SE
%b0 (intercept) = 1.1929 +- 0.53329
%b1 = 0.29033 +- 0.038919
%
%sigma2 = 0.12494
%
%Independent variable means, variances, and correlations
%aX1 = 7.4533
%s2X1 = 0.82363


%METHOD = ML
%
%Data file and ME file saved as 'workingDataFile.txt' and 'workingMEfile.txt'
%
%Coefficients with approximate SE
%b0 (intercept) = 1.1659 +- 0.52755
%b1 = 0.29394 +- 0.039164
%
%correlation matrix of estimates of b = 
%    1    -0.55338
%    -0.55338           1
%
%sigma2 = 0.1046
%
%Independent variable means, variances, and correlations
%aX1 = 7.4534
%s2X1 = 0.81061
%
%LnLikelihood = -138.0037
%-2LL = 276.0074
%AIC(par=5) = 286.0074


%METHOD = REML
%
%Data file and ME file saved as 'workingDataFile.txt' and 'workingMEfile.txt'
%
%Coefficients
%b0 (intercept) = 1.1648
%b1 = 0.29384
%
%sigma2 = 0.10462
%
%Independent variable means, variances, and correlations
%aX1 = 7.4597
%s2X1 = 0.020603
%
%LnLikelihood = -132.3263
%-2LL = 264.6526
%AIC(par=2) = 268.6526

phytools

When measurement error is ignored

library(phytools)
data.spec = read.table("primate_spec.txt", sep = "\t", header = TRUE)
tree = read.tree("primate_tree.phy")
pgls.Ives(tree = tree, X = setNames(data.spec$lg.body, data.spec$species), y = setNames(data.spec$lg.brain,
    data.spec$species), Vx = setNames(rep(0, length(tree$tip)), data.spec$species),
    Vy = setNames(rep(0, length(tree$tip)), data.spec$species), Cxy = setNames(rep(0,
        length(tree$tip)), data.spec$species))

## $beta
## [1] 0.6108 0.3123
## 
## $sig2x
## [1] 0.108
## 
## $sig2y
## [1] 0.009439
## 
## $a
## [1] -0.5942  0.4252
## 
## $logL
## [1] -156.5
## 
## $convergence
## [1] 0
## 
## $message
## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"

The convergence is often sensitive to starting conditions. It may be more straightforward to repeat the above procedure several times (10-100) and select the results with the highest likelihood:

res = c()
for (k in 1:100) {
    mod = try(pgls.Ives(tree = tree, X = setNames(data.spec$lg.body, data.spec$species),
        y = setNames(data.spec$lg.brain, data.spec$species), Vx = setNames(rep(0,
            length(tree$tip)), data.spec$species), Vy = setNames(rep(0, length(tree$tip)),
            data.spec$species), Cxy = setNames(rep(0, length(tree$tip)), data.spec$species)))
    if (class(mod) != "try-error") {
        int = mod$beta[1]
        slope = mod$beta[2]
        lik = mod$logL
        anc_a = mod$a[1]
        anc_b = mod$a[2]
        conv = mod$convergence
    }
    res = rbind(res, data.frame(k, int, slope, anc_a, anc_b, lik, conv))
}
res[which(res[, 6] == max(subset(res, conv == 0)[, 6])), ]

##     k   int  slope anc_a anc_b    lik conv
## 56 56 1.215 0.2872 7.434  3.35 -138.7    0

When considering measurement error (on both the response and predictor, but without within-species covariance)

res = c()
for (k in 1:100) {
    mod = try(pgls.Ives(tree = tree, X = setNames(data.spec$lg.body, data.spec$species),
        y = setNames(data.spec$lg.brain, data.spec$species), Vx = setNames(data.spec$lg.body_var.adj,
            data.spec$species), Vy = setNames(data.spec$lg.brain_var.adj, data.spec$species),
        Cxy = setNames(rep(0, length(tree$tip)), data.spec$species)))
    if (class(mod) != "try-error") {
        int = mod$beta[1]
        slope = mod$beta[2]
        lik = mod$logL
        anc_a = mod$a[1]
        anc_b = mod$a[2]
        conv = mod$convergence
    }
    res = rbind(res, data.frame(k, int, slope, anc_a, anc_b, lik, conv))
}
res[which(res[, 6] == max(subset(res, conv == 0)[, 6])), ]

##     k     int  slope anc_a anc_b    lik conv
## 59 59 -0.2379 0.4826  7.46 3.362 -125.7    0

GLSME

When both measurement error and phylogeny are ignored (GLS)

# functions are stored in a source file (available from K. Bartoszek's
# website)
source("GLSME.R")
data.spec = read.table("primate_spec.txt", sep = "\t", header = TRUE)
row.names(data.spec) = data.spec$species
tree = read.tree("primate_tree.phy")
gls.glsme = GLSME(y = data.spec$lg.brain, D = cbind(rep(1, length(data.spec$lg.body)),
    data.spec$lg.body), Vt = 1, Ve = 0, Vd = list("F", "F"), Vu = 0)
gls.glsme

## $GLSestimate
##         [,1]
## [1,] -1.6429
## [2,]  0.7128
## 
## $errorGLSestim
##         [,1]
## [1,] 0.32506
## [2,] 0.04057
## 
## $BiasCorrectedGLSestimate
##         [,1]
## [1,] -1.6429
## [2,]  0.7128
## 
## $errorBiasCorrectedGLSestim
##         [,1]
## [1,] 0.32506
## [2,] 0.04057
## 
## $K
##      [,1] [,2]
## [1,]    1    0
## [2,]    0    1
## 
## $R2
##        [,1]
## [1,] 0.7821
## 
## $R2BiasCorrectedModel
##        [,1]
## [1,] 0.7821
## 
## $LogLikelihood
## [1] -63.45
## 
## $LogLikelihoodBiasCorrectedModel
## [1] -63.45
## 
## $PredictorVarianceConstantEstimate
## [1] 0 0
## 
## $ResponseVarianceConstantEstimate
## [1] 0.2561

When measurement error is ignored but phylogeny is accounted for (PGLS)

# the order of species in the data should be the same as the order of
# species in the VCV matrix, thus the dataset should be resorted as
# data.spec[tree$tip.label,]!
pgls.glsme = GLSME(y = data.spec[tree$tip.label, ]$lg.brain, D = cbind(rep(1,
    length(data.spec$lg.body)), data.spec[tree$tip.label, ]$lg.body), Vt = vcv(tree),
    Ve = 0, Vd = list("F", "F"), Vu = 0)
pgls.glsme

## $GLSestimate
##       [,1]
## [1,] 1.218
## [2,] 0.287
## 
## $errorGLSestim
##         [,1]
## [1,] 0.52707
## [2,] 0.03846
## 
## $BiasCorrectedGLSestimate
##       [,1]
## [1,] 1.218
## [2,] 0.287
## 
## $errorBiasCorrectedGLSestim
##         [,1]
## [1,] 0.52707
## [2,] 0.03846
## 
## $K
##      [,1] [,2]
## [1,]    1    0
## [2,]    0    1
## 
## $R2
##       [,1]
## [1,] 0.393
## 
## $R2BiasCorrectedModel
##       [,1]
## [1,] 0.393
## 
## $LogLikelihood
## [1] -25.02
## 
## $LogLikelihoodBiasCorrectedModel
## [1] -25.02
## 
## $PredictorVarianceConstantEstimate
## [1] 0 0
## 
## $ResponseVarianceConstantEstimate
## [1] 0.009235

When both measurement error and phylogeny are accounted for (considering measurement error (on both the response and predictor, but without within-species covariance)

# again, the order of species in the data should be the same as the order of
# species in the VCV matrix, thus the dataset should be resorted as
# data.spec[tree$tip.label,]! in addition, note that when there is a
# measurement error in the predictors, these are treated as 'random effects'
# (as specified by the non zero entry in the Vd matrix), and they will be
# centered on their means by default (it can be switched off by setting
# CenterPredictor=FALSE)
pglsME.glsme = GLSME(y = data.spec[tree$tip.label, ]$lg.brain, D = cbind(rep(1,
    length(data.spec$lg.body)), data.spec[tree$tip.label, ]$lg.body), Vt = vcv(tree),
    Ve = diag(data.spec[tree$tip.label, ]$lg.brain_var.adj), Vd = list("F",
        vcv(tree)), Vu = list("F", diag(data.spec[tree$tip.label, ]$lg.body_var.adj)))
pglsME.glsme

## $GLSestimate
##        [,1]
## [1,] 3.3610
## [2,] 0.3209
## 
## $errorGLSestim
##         [,1]
## [1,] 0.48730
## [2,] 0.04689
## 
## $BiasCorrectedGLSestimate
##        [,1]
## [1,] 3.3582
## [2,] 0.4173
## 
## $errorBiasCorrectedGLSestim
##         [,1]
## [1,] 0.48730
## [2,] 0.06111
## 
## $K
##      [,1]     [,2]
## [1,]    1 0.006861
## [2,]    0 0.769192
## 
## $R2
##        [,1]
## [1,] 0.4894
## 
## $R2BiasCorrectedModel
##        [,1]
## [1,] 0.5328
## 
## $LogLikelihood
## [1] -23.66
## 
## $LogLikelihoodBiasCorrectedModel
## [1] -21.25
## 
## $PredictorVarianceConstantEstimate
## [1] 0.00000 0.04932
## 
## $ResponseVarianceConstantEstimate
## [1] 0.0112

# as a consequence of centering, the estimated intercept corresponds to
# b0+b1*mean(data.spec$lg.body) from model data.spec$lg.brain =
# b0+b1*mean(data.spec$lg.body) + b1*(data.spec$lg.body-
# mean(data.spec$lg.body)). Therefore, the intercept of interests can be
# obtained as:
pglsME.glsme$BiasCorrectedGLSestimate[1] - pglsME.glsme$BiasCorrectedGLSestimate[2] *
    mean(data.spec$lg.body)

## [1] 0.06225

# from which, the slope of interest is:
pglsME.glsme$BiasCorrectedGLSestimate[2]

## [1] 0.4173

# taking into account estimation error after the bias correction, the slope
# falls within the error range of:
slope_range = rbind(pglsME.glsme$BiasCorrectedGLSestimate[2] - pglsME.glsme$errorBiasCorrectedGLSestim[2],
    pglsME.glsme$BiasCorrectedGLSestimate[2] + pglsME.glsme$errorBiasCorrectedGLSestim[2])
slope_range

##        [,1]
## [1,] 0.3561
## [2,] 0.4784

# the standard error of the intercept can be computed based on the variance
# of the noncentered intercept estimate (T. Hansen and K. Bartoszek pers.
# comm.):
int_var = pglsME.glsme$errorBiasCorrectedGLSestim[1]^2 + pglsME.glsme$errorBiasCorrectedGLSestim[2]^2 *
    mean(data.spec$lg.body)^2
sqrt(int_var)

## [1] 0.6859

# from this the error range around the non-centred intercept can be given:
intercept_range = rbind(pglsME.glsme$BiasCorrectedGLSestimate[1] - pglsME.glsme$BiasCorrectedGLSestimate[2] *
    mean(data.spec$lg.body) - sqrt(int_var), pglsME.glsme$BiasCorrectedGLSestimate[1] -
    pglsME.glsme$BiasCorrectedGLSestimate[2] * mean(data.spec$lg.body) + sqrt(int_var))
intercept_range

##         [,1]
## [1,] -0.6237
## [2,]  0.7481

MCMCglmm

To prepare the dataset, we need to generate a column named ‘animal’

library(MCMCglmm)
data.spec = read.table("primate_spec.txt", sep = "\t", header = TRUE)
data.ind = read.table("primate_ind.txt", sep = "\t", header = TRUE)
tree = read.tree("primate_tree.phy")
data.spec$animal = data.spec$species
data.ind$animal = data.ind$species

When measurement error is ignored, and species-specific data are used (equivalent with PGLS)

prior <- list(R = list(V = 1, nu = 0.002), G = list(G1 = list(V = 1, nu = 0.002)))
mcmc = MCMCglmm(lg.brain ~ lg.body, random = ~animal, data = data.spec, pedigree = tree,
    prior = prior, verbose = F, family = "gaussian", nodes = "TIPS")
summary(mcmc)

## 
##  Iterations = 3001:12991
##  Thinning interval  = 10
##  Sample size  = 1000 
## 
##  DIC: -238.3 
## 
##  G-structure:  ~animal
## 
##        post.mean l-95% CI u-95% CI eff.samp
## animal      0.69     0.47    0.904     1000
## 
##  R-structure:  ~units
## 
##       post.mean l-95% CI u-95% CI eff.samp
## units   0.00327 0.000199  0.00967      226
## 
##  Location effects: lg.brain ~ lg.body 
## 
##             post.mean l-95% CI u-95% CI eff.samp  pMCMC    
## (Intercept)    1.0990   0.0235   2.0715     1000  0.024 *  
## lg.body        0.2996   0.2282   0.3870     1000 <0.001 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

When measurement error is considered on the predictor variable, and individual-specific data are used (phylogenetic mixed model)

# we need within-subject centering to do this (van de Pol and Wright 2009),
# the approprate data transformation:
data.ind$lg.body = log(data.ind$body)
data.ind$lg.brain = log(data.ind$brain)
means = tapply(X = data.ind$lg.body, INDEX = data.ind$species, FUN = mean)
N = tapply(X = data.ind$lg.body, INDEX = data.ind$species, FUN = length)
mean.vector = vector()
for (k in 1:length(means)) {
    mean.set = rep(means[k], N[k])
    mean.vector = rbind(mean.vector, as.data.frame(mean.set))
}
data.ind$body_av = mean.vector[, 1]
data.ind$body_centr <- data.ind$lg.body - data.ind$body_av

# the model based on individual-specific data
prior2 = list(G = list(G1 = list(V = 1, nu = 0.02), G2 = list(V = 1, nu = 0.02)),
    R = list(V = 1, nu = 0.02))
mcmc.ind.centr = MCMCglmm(lg.brain ~ body_centr + body_av, random = ~animal +
    species, data = data.ind, pedigree = tree, prior = prior2, verbose = F,
    family = "gaussian", nodes = "TIPS")
summary(mcmc.ind.centr)

## 
##  Iterations = 3001:12991
##  Thinning interval  = 10
##  Sample size  = 1000 
## 
##  DIC: -1236 
## 
##  G-structure:  ~animal
## 
##        post.mean l-95% CI u-95% CI eff.samp
## animal     0.649     0.44    0.899     1000
## 
##                ~species
## 
##         post.mean l-95% CI u-95% CI eff.samp
## species   0.00642   0.0017   0.0134      469
## 
##  R-structure:  ~units
## 
##       post.mean l-95% CI u-95% CI eff.samp
## units    0.0134    0.012   0.0147      911
## 
##  Location effects: lg.brain ~ body_centr + body_av 
## 
##             post.mean l-95% CI u-95% CI eff.samp  pMCMC    
## (Intercept)     1.445    0.542    2.492     1000  0.002 ** 
## body_centr      0.129    0.114    0.144     1000 <0.001 ***
## body_av         0.258    0.184    0.318     1000 <0.001 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The slope for ‘body_av’ that gives the interspecific slope that is controlled for phylogeny and within-species variance in body mass (predictor).

See more exercises with "MCMCglmm" in the OPM for chapter 11

References

Arnold C, Matthews LJ, Nunn CL (2010) The 10kTrees website: a new online resource for primate hylogeny. Evol Anthropol 19: 114-118.
Hadfield JD (2010) MCMC methods for multi-response generalized linear mixed models: the MCMCglmm R package. Journal of Statistical Software 33: 1-22.
Hansen TF, Bartoszek K (2012) Interpreting the evolutionary regression: the interplay between observational and biological errors in phylogenetic comparative studies. Syst Biol 61: 413-425.
Ives AR, Midford PE, Garland T (2007) Within-species variation and measurement error in phylogenetic comparative methods. Syst Biol 56: 252-270.
Paradis E, Claude J, Strimmer K (2004) APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20: 289-290.
Pinheiro J, Bates D, DebRoy S, Sarkar D, R Development Core Team (2012) nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1-104.
Revell LJ (2012) Phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol Evol 3: 217-223.
van de Pol MV, Wright J (2009) A simple method for distinguishing within- versus between-subject effects using mixed models. Anim Behav 77: 753-758.

07. Chapter 7

Chapter 7 – Uncertainties Due to Within-Species Variation in Comparative Studies: Measurement Errors and Statistical Weights

Data

Function

1) Repeatability

Sources

R packages

Data

Codes

Data input

rptR

lme4

MCMCglmm

References

2) Data Preparation

Sources

R packages

Data

Codes

References

3) Weighted regression

Sources

R packages

Data

Codes

Ordinary least-square models without accounting for phylogeny

Phylogenetic least-square models with weights

Plotting the results

References

4) Fitting univariate evolutionary models with intraspecific varaince to estimate species-specific traits (means and variances), ancestral states, evolutionary rate and phylogenetic signal

Sources

R packages

Data

Codes

Estimating species-specific means and variances, ancestral states and evolutionary rates by using Bayesian MCMC

Estimating phylogenetic signal

References

5) Studying correlated evolution in multivariate phylogenetic regression models by incorporating within-species variance

Sources

R packages

Matlab packages

Data

Codes

ape & nlme (PGLS)

MERegPHYSIG (outputs from the program)

phytools

GLSME

MCMCglmm

References

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally