Monday, January 16, 2012

Fitting Pagel's λ for a single trait using multiple methods

Because it related to something else I was working on, I today did a quick comparison of the (at least 4) different methods available in R to fit Pagel's λ model for a single trait. I just did this for a single 200 taxon tree and compared the estimates (which are all, to reasonable numerical precision, the same) and the computation time. A more thorough comparison would require that we run this code over multiple simulations to try and determine the robustness of each function. This I have not done, but anyone wishing to loop the following analyses 100 or more times should be my guest. This was run on my VAIO AMD E-450 laptop @ 1.65 GHz (so, not very fast):

> require(phytools); require(geiger); require(caper); require(nlme)
> tree<-pbtree(n=200,scale=1) # simulate tree
> x<-fastBM(lambdaTree(tree,0.7)) # simulate data
> # fit using phytools:phylosig
> system.time(m1<-phylosig(tree,x,method="lambda"))
   user  system elapsed
   2.79    0.00    2.79
> # fit using geiger:fitContinuous
> system.time(m2<-fitContinuous(tree,x,model="lambda"))
Fitting  lambda model:
   user  system elapsed
137.11    0.42  138.90
> # fit using nlme:gls
> d<-data.frame(x)
> system.time(m3<-gls(x~1,data=d,correlation=corPagel(0.5,tree,fixed=F), method="ML"))
   user  system elapsed
  51.56    0.22   53.86
> # fit using caper:pgls
> d<-data.frame(x,names(x)); colnames(d)[2]<-"Species"
> system.time(m4<-pgls(x~1,,d,"Species"), lambda="ML"))
   user  system elapsed
  38.13    0.01   38.25

We can then get a summary as follows:

> result<-matrix(c(m1$lambda,m1$logL,
> result
                 lambda      logL
phylosig      0.7478915 -230.7348
fitContinuous 0.7478919 -230.7348
gls           0.7478922 -230.7348
pgls          0.7478922 -230.7348

So, at least for this example, phylosig is quite a bit faster, but it makes no discernible difference in the results. Why this would be the case is not at all clear to me (phylosig uses the built in optimizer, optimize, and performs full ML not REML), but I hope that it does not come at the cost of robust estimation. Certainly this could be assessed by repeating the above analysis for many simulations.


  1. One possible explanation for the faster run time of phylosig that I thought of after writing this post is that I use the analytically derived solutions for σ^2 & the root value, conditioned on λ, so that ML optimization is a univariate rather than a multivariate maximization problem. This produces the same parameter estimates & likelihood but should be (theoretically and, evidently, in practice) much quicker.

  2. Hi,

    Your code is impressively fast! But I think the calculation can be done even faster - using Felsenstein's (1973) algorithm one can avoid creating the variance covariance matrix or having to invert it. Using that algorithm (here called 'fast') I get the following:

    lambda logL Time
    phylosig 0.7114102 -220.1066 0.693
    fitContinuous 0.7114085 -220.1066 39.770
    gls 0.7114125 -220.1066 14.666
    pgls 0.7114119 -220.1066 23.730
    fast 0.7114101 -220.1066 0.141

    The differences are exacerbated using larger trees, e.g. here with 1000 tips:

    lambda logL Time
    phylosig 0.7463478 -1109.464 32.126
    fitContinuous 0.7463439 -1109.464 1146.695
    gls 0.7463487 -1109.464 380.447
    pgls 0.7463473 -1109.464 641.631
    fast 0.7463478 -1109.464 0.801

    I wonder if the other methods are performing un-necessary computation with respect to forming the variance matrix?


  3. This is fantastic Rob.

    Yes, as I alluded, I was initially perplexed as to why phylosig was faster (as I made no attempt to be computationally economic) but then I realized it is because it does only univariate optimization, using the conditional MLEs of sigma^2 and the root state.

    An obvious alternative is to do REML estimation of λ based on contrasts; however Rob's suggestion, to use the algorithm of Felsenstein (1973), is even better. This could certainly be used to speed up likelihood calculation for other models as well.

    For those not familiar with this article, I recently discovered that it is available online, evidently for free - here. The algorithm Rob is describing begins on p. 476, I believe.