A phytools user reported to me the other day that they were finding that a
PGLS model that was non-significant assuming a Brownian correlation structure
of the residual error (corBrownian
) became highly significant
with Pagel's λ just slightly greater than 1.0.
This seems kind of mysterious because λ=1.0 is equivalent to an assumption of Brownian motion. On the other hand, λ is only defined when the correlations implied by λ are <=1.0. Usually this is only for values of λ (no more than) slightly larger than 1. [λ is always defined between 0 & 1.]
My hunch is that this can be best understood using contrasts. With contrasts, when we have a contrast between terminal species of zero or near zero length, the inevitable result for λ>1, this contrast (being standardized by dividing by a quantity very close to zero) will tend to have disproportionate weight in the contrasts regression.
To examine this, I set up the following simple simulation. First, I simulated a tree with slightly longer terminal edges than expected under pure-birth. This is probably not necessary, but I did this just to help exagerrate the effect of λ>1. Next, I simulated uncorrelated data on that tree assuming BM for x & y. Then, I fit a PGLS either setting λ to 1 (i.e., BM) or setting λ to it's maximum possible value. Finally I obtained the P-values for each fitted model & compared them.
Here is what I found:
## load libraries
library(phytools)
library(nlme)
## function for simulation
foo<-function(nrep=500){
Pbm<-Pml<-lambda<-vector()
cat("\n|")
for(i in 1:nrep){
tree<-phytools:::lambdaTree(pbtree(n=26,tip.label=LETTERS),0.95)
x<-fastBM(tree)
y<-fastBM(tree)
corBM<-corBrownian(1,tree)
lambda[i]<-phytools:::maxLambda(tree)
corML<-corBrownian(1,phytools:::lambdaTree(tree,lambda[i]))
fitbm<-gls(y~x,data=data.frame(x,y),correlation=corBM)
fitml<-gls(y~x,data=data.frame(x,y),correlation=corML)
Pbm[i]<-anova(fitbm)$'p-value'[2]
Pml[i]<-anova(fitml)$'p-value'[2]
cat(".")
if(i%%50==0&&i!=nrep) cat("\n ")
flush.console()
}
cat("|\n\n")
cat("\nDone simulation.")
list(Pbm=Pbm,Pml=Pml,lambda=lambda)
}
## run
obj<-foo()
##
## |..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................|
##
##
## Done simulation.
## visualize our results
par(mfrow=c(2,1))
hist(obj$Pbm,col="grey",xlab="P-value assuming BM",main="")
hist(obj$Pml,col="grey",xlab="P-value assuming lambda=max(lambda)",
main="")
hist(obj$lambda,col="grey",main="Maximum value of lambda",xlab="lambda")
I'm not sure what to make of this - and whether this result suggests (or not) that λ should only be set or estimated between 0 & 1. This is already the strategy employed, I believe, by functions for PGLS in the caper package. Nonetheless, the result is here. Make of it what you will!
Hi Liam,
ReplyDeleteI am late to this post but I am wondering if you have any updated information regarding how to best approach cases where your estimated lambda is greater than 1 (when estimated using ML in nlme's gls function).
I am dealing with a case where phyl.resid (using phytools) estimates lambda as .95 but the gls estimates lambda as 1.18. The results from using a 1.18 lambda in the gls function seem pretty outrageous and are quite different from assuming a brownian motion model (lambda=1). Using the lambda above 1 completely alters the post-hoc estimates of some of my independent categorical variables in ways that seem to make no sense.
Given the odd post-hoc estimates, I am inclined to limit lambda to 1 but was wondering if you had any further suggestions.
Thanks!
one slight update - estimated gls lambda is 1.108 (not 1.18)
Delete