## Monday, January 18, 2016

### Phylogenetic regression when estimated Pagel's λ > 1

A phytools user reported to me the other day that they were finding that a PGLS model that was non-significant assuming a Brownian correlation structure of the residual error (`corBrownian`) became highly significant with Pagel's λ just slightly greater than 1.0.

This seems kind of mysterious because λ=1.0 is equivalent to an assumption of Brownian motion. On the other hand, λ is only defined when the correlations implied by λ are <=1.0. Usually this is only for values of λ (no more than) slightly larger than 1. [λ is always defined between 0 & 1.]

My hunch is that this can be best understood using contrasts. With contrasts, when we have a contrast between terminal species of zero or near zero length, the inevitable result for λ>1, this contrast (being standardized by dividing by a quantity very close to zero) will tend to have disproportionate weight in the contrasts regression.

To examine this, I set up the following simple simulation. First, I simulated a tree with slightly longer terminal edges than expected under pure-birth. This is probably not necessary, but I did this just to help exagerrate the effect of λ>1. Next, I simulated uncorrelated data on that tree assuming BM for x & y. Then, I fit a PGLS either setting λ to 1 (i.e., BM) or setting λ to it's maximum possible value. Finally I obtained the P-values for each fitted model & compared them.

Here is what I found:

``````## load libraries
library(phytools)
library(nlme)

## function for simulation
foo<-function(nrep=500){

Pbm<-Pml<-lambda<-vector()

cat("\n|")

for(i in 1:nrep){
tree<-phytools:::lambdaTree(pbtree(n=26,tip.label=LETTERS),0.95)

x<-fastBM(tree)
y<-fastBM(tree)

corBM<-corBrownian(1,tree)
lambda[i]<-phytools:::maxLambda(tree)
corML<-corBrownian(1,phytools:::lambdaTree(tree,lambda[i]))

fitbm<-gls(y~x,data=data.frame(x,y),correlation=corBM)
fitml<-gls(y~x,data=data.frame(x,y),correlation=corML)

Pbm[i]<-anova(fitbm)\$'p-value'[2]
Pml[i]<-anova(fitml)\$'p-value'[2]

cat(".")
if(i%%50==0&&i!=nrep) cat("\n ")
flush.console()

}

cat("|\n\n")

cat("\nDone simulation.")

list(Pbm=Pbm,Pml=Pml,lambda=lambda)

}
## run
obj<-foo()
``````
``````##
## |..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................|
##
##
## Done simulation.
``````
``````## visualize our results
par(mfrow=c(2,1))
hist(obj\$Pbm,col="grey",xlab="P-value assuming BM",main="")
hist(obj\$Pml,col="grey",xlab="P-value assuming lambda=max(lambda)",
main="")
``````

``````hist(obj\$lambda,col="grey",main="Maximum value of lambda",xlab="lambda")
``````

I'm not sure what to make of this - and whether this result suggests (or not) that λ should only be set or estimated between 0 & 1. This is already the strategy employed, I believe, by functions for PGLS in the caper package. Nonetheless, the result is here. Make of it what you will!