Monday, January 18, 2016

Phylogenetic regression when estimated Pagel's λ > 1

A phytools user reported to me the other day that they were finding that a PGLS model that was non-significant assuming a Brownian correlation structure of the residual error (corBrownian) became highly significant with Pagel's λ just slightly greater than 1.0.

This seems kind of mysterious because λ=1.0 is equivalent to an assumption of Brownian motion. On the other hand, λ is only defined when the correlations implied by λ are <=1.0. Usually this is only for values of λ (no more than) slightly larger than 1. [λ is always defined between 0 & 1.]

My hunch is that this can be best understood using contrasts. With contrasts, when we have a contrast between terminal species of zero or near zero length, the inevitable result for λ>1, this contrast (being standardized by dividing by a quantity very close to zero) will tend to have disproportionate weight in the contrasts regression.

To examine this, I set up the following simple simulation. First, I simulated a tree with slightly longer terminal edges than expected under pure-birth. This is probably not necessary, but I did this just to help exagerrate the effect of λ>1. Next, I simulated uncorrelated data on that tree assuming BM for x & y. Then, I fit a PGLS either setting λ to 1 (i.e., BM) or setting λ to it's maximum possible value. Finally I obtained the P-values for each fitted model & compared them.

Here is what I found:

## load libraries
library(phytools)
library(nlme)

## function for simulation
foo<-function(nrep=500){

    Pbm<-Pml<-lambda<-vector()

    cat("\n|")

    for(i in 1:nrep){
        tree<-phytools:::lambdaTree(pbtree(n=26,tip.label=LETTERS),0.95)

        x<-fastBM(tree)
        y<-fastBM(tree)

        corBM<-corBrownian(1,tree)
        lambda[i]<-phytools:::maxLambda(tree)
        corML<-corBrownian(1,phytools:::lambdaTree(tree,lambda[i]))

        fitbm<-gls(y~x,data=data.frame(x,y),correlation=corBM)
        fitml<-gls(y~x,data=data.frame(x,y),correlation=corML)

        Pbm[i]<-anova(fitbm)$'p-value'[2]
        Pml[i]<-anova(fitml)$'p-value'[2]

        cat(".")
        if(i%%50==0&&i!=nrep) cat("\n ")
        flush.console()

    }

    cat("|\n\n")

    cat("\nDone simulation.")

    list(Pbm=Pbm,Pml=Pml,lambda=lambda)

}
## run
obj<-foo()
## 
## |..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................|
## 
## 
## Done simulation.
## visualize our results
par(mfrow=c(2,1))
hist(obj$Pbm,col="grey",xlab="P-value assuming BM",main="")
hist(obj$Pml,col="grey",xlab="P-value assuming lambda=max(lambda)",
    main="")

plot of chunk unnamed-chunk-1

hist(obj$lambda,col="grey",main="Maximum value of lambda",xlab="lambda")

plot of chunk unnamed-chunk-2

I'm not sure what to make of this - and whether this result suggests (or not) that λ should only be set or estimated between 0 & 1. This is already the strategy employed, I believe, by functions for PGLS in the caper package. Nonetheless, the result is here. Make of it what you will!

No comments:

Post a Comment