A phytools user reported to me the other day that they were finding that a
PGLS model that was non-significant assuming a Brownian correlation structure
of the residual error (`corBrownian`

) became highly significant
with Pagel's λ just slightly greater than 1.0.

This seems kind of mysterious because λ=1.0 is equivalent to an assumption of Brownian motion. On the other hand, λ is only defined when the correlations implied by λ are <=1.0. Usually this is only for values of λ (no more than) slightly larger than 1. [λ is always defined between 0 & 1.]

My hunch is that this can be best understood using contrasts. With contrasts, when we have a contrast between terminal species of zero or near zero length, the inevitable result for λ>1, this contrast (being standardized by dividing by a quantity very close to zero) will tend to have disproportionate weight in the contrasts regression.

To examine this, I set up the following simple simulation. First, I simulated
a tree with slightly longer terminal edges than expected under pure-birth.
This is probably not necessary, but I did this just to help exagerrate the
effect of λ>1. Next, I simulated uncorrelated data on that tree
assuming BM for *x* & *y*. Then, I fit a PGLS either setting
λ to 1 (i.e., BM) or setting λ to it's maximum possible value.
Finally I obtained the P-values for each fitted model & compared them.

Here is what I found:

```
## load libraries
library(phytools)
library(nlme)
## function for simulation
foo<-function(nrep=500){
Pbm<-Pml<-lambda<-vector()
cat("\n|")
for(i in 1:nrep){
tree<-phytools:::lambdaTree(pbtree(n=26,tip.label=LETTERS),0.95)
x<-fastBM(tree)
y<-fastBM(tree)
corBM<-corBrownian(1,tree)
lambda[i]<-phytools:::maxLambda(tree)
corML<-corBrownian(1,phytools:::lambdaTree(tree,lambda[i]))
fitbm<-gls(y~x,data=data.frame(x,y),correlation=corBM)
fitml<-gls(y~x,data=data.frame(x,y),correlation=corML)
Pbm[i]<-anova(fitbm)$'p-value'[2]
Pml[i]<-anova(fitml)$'p-value'[2]
cat(".")
if(i%%50==0&&i!=nrep) cat("\n ")
flush.console()
}
cat("|\n\n")
cat("\nDone simulation.")
list(Pbm=Pbm,Pml=Pml,lambda=lambda)
}
## run
obj<-foo()
```

```
##
## |..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................|
##
##
## Done simulation.
```

```
## visualize our results
par(mfrow=c(2,1))
hist(obj$Pbm,col="grey",xlab="P-value assuming BM",main="")
hist(obj$Pml,col="grey",xlab="P-value assuming lambda=max(lambda)",
main="")
```

```
hist(obj$lambda,col="grey",main="Maximum value of lambda",xlab="lambda")
```

I'm not sure what to make of this - and whether this result suggests (or not) that λ should only be set or estimated between 0 & 1. This is already the strategy employed, I believe, by functions for PGLS in the caper package. Nonetheless, the result is here. Make of it what you will!

## No comments:

## Post a Comment