Thursday, May 5, 2011

Discrepant results from fitContinuous(...,method="lambda") and phylosig()

I just received the following email query (edited for brevity):

"I encountered different results of lambda using your phylosig code and geiger's fitContinuous for some of my datasets.... I'm sending one of my datasets and the phylogenetic tree, and below are the codes that I've used to produce the results...."

I downloaded his code and dataset, and, indeed, the results appeared to be different. In particular:

> phylosig(tree,x,method="lambda")
$lambda
[1] 0.9527932
$logL
[1] -255.0452


while:

> fitContinuous(tree,x,model="lambda")
Fitting lambda model:
$Trait1
$Trait1$lnl
[1] -259.0189
$Trait1$beta
[1] 20
$Trait1$lambda
[1] 0.2444363
...


Uhoh. My first thought, naturally, was that the beta version of my phylosig() function has an error (and indeed it still might). But then looking a little more closely at the result of fitContinuous() I started to become a little suspicious. In particular, I noticed that $Trait1$beta (the fitted value of σ2 - the evolutionary rate for this model) was exactly 20. This led me to suspect that fitContinuous() probably does its multivariate optimization using bounds that did not include the MLE of σ2 for these data. If so, this should be resolved by rescaling the data as λ is scale insensitive. Indeed, it did:

> phylosig(tree,x/10,method="lambda")
$lambda
[1] 0.9527932
$logL
[1] -135.3108

> fitContinuous(tree,x/10,model="lambda")
Fitting lambda model:
$Trait1
$Trait1$lnl
[1] -135.3108
$Trait1$beta
[1] 0.661207
$Trait1$lambda
[1] 0.9527761
...


Although it is not clear what bounds are used by default in fitContinuous(), it is straightforward to adjust these. In this case, we would just do:

> fitContinuous(tree,x,model="lambda",bounds=list(beta=c(0,100)))
Fitting lambda model:
$Trait1
$Trait1$lnl
[1] -255.0452

$Trait1$beta
[1] 66.11927

$Trait1$lambda
[1] 0.9527657
...


Code for my phylosig() function is of course available on my website.

5 comments:

  1. Thanks, Liam.
    I would have taken a long time to figure it out.

    Diego

    ReplyDelete
  2. Finally getting around to trying this on my graptolite data. For a tree with 1752 tips:

    >system.time(test<-phylosig(tree,trait,method="lambda"))
    user system elapsed
    1064.56 16.36 1358.77

    So, about 21 min. Well, that's pretty good, given that it takes several hours for fitContinuous() on a tree of the same size. I'll probably take a look and make sure I can't strip it down to make it go even faster.

    ReplyDelete
  3. Hi Dave. There might be a reason that phylosig() is "faster" - are you sure that both functions are converging? One indication - by no means a guarantee of course - would be if both phylosig() and fitContinuous() both found the same estimates of lambda.
    Hopefully, I'm just being cynical - but I have not tried to use this function (or really any of my functions) on such a large tree.

    ReplyDelete
  4. Good point. I'll try it on a few very large trees and see what happens.

    ReplyDelete
  5. Hi Liam, thanks for the post on the discrepancy in results between your function and fitContinuous. After reading your post I ran some analyses using fitContinuous() under a Brownian Motion model and found similar behavior to what you mention above. I ran an initial analysis without defining any bounds for beta, in some of the traits I got beta=20, so I wondered if this was due to the bounds imposed on beta. Indeed including bounds=list(beta=c(0,100) resulted in different beta values, as well as different log likelihoods for the fit of the model! It seems as though the present bounds imposed in the fitContinuous model limit beta to 20, for traits where beta was actually higher or just 20, the likelihood differed between models using fitContinuous() and models fit including wider bounds for the beta value.
    Cheers, Alejandro

    ReplyDelete