Sunday, December 29, 2013

More updates to rateshift method: Testing for the presence of a rate shift

I have made some updates to the function rateshift (first described here) to facilitate comparison of alternative models for rate shifts; as well as for testing the null hypothesis of no shift.

First, I fixed the function so it could fit a no-rate-shift model. That was broken in the previous version, but should work now. When we fit the nrates=1 model, the fitted model parameter value (σ2) and log-likelihood should be the same as from (say) fitContinuous in geiger or the one-rate model in brownie.lite.

Second, I created an S3 generic logLik method for the object of class "rateshift" returned by the function. This allows us to easily extract the log-likelihood & model parameterization; but it also allows us to use the generic AIC to compute the Akaike Information Criterion value for the fitted model.

Finally, third, I fixed a minor bug which sometimes created an incompatibility in the tolerance (basically, the very small values we need to add or subtract from some quantities to make sure that the function does not attempt to evaluate the likelihood where it isn't defined) are inconsistent between rateshift and make.era.map, which is used internally. This required changes to both functions, so the wisest thing to do to get this update is to update phytools to the latest non-CRAN version.

OK. Here's a demo:

> require(phytools)
Loading required package: phytools
Loading required package: ape
Loading required package: maps
Loading required package: rgl
> packageVersion("phytools")
[1] ‘0.3.86’
> ## simulate tree & data
> tree<-pbtree(n=100,scale=1)
> tree<-make.era.map(tree,c(0,0.5,0.8))
> x<-sim.rates(tree,c(1,10,1),internal=TRUE)
names absent from sig2: assuming same order as $mapped.edge
> ## here's a visual of our simulation
> phenogram(tree,x,ftype="off")
> ## peel off ancestral states
> x<-x[tree$tip.label]
>
> ## fit 1 rate model
> fit1<-rateshift(tree,x,nrates=1)
> fit1
ML 1-rate model:
      s^2(1)  se(1)  k  logL   
value 1.7966  0.2542 2  -81.8257

This is a one-rate model.

R thinks it has found the ML solution.

> ## fit 2 rate model
> fit2<-rateshift(tree,x,nrates=2)
> fit2
ML 2-rate model:
      s^2(1)  se(1)  s^2(2)  se(2)  k  logL   
value 6.6364  2.1013  0.785  0.1345 4  -64.3827

Shift point(s) between regimes (height above root):
        1|2    se(1|2)
value  0.806  0.02

R thinks it has found the ML solution.

> ## test 2 rates vs 1 rate
> P2vs1<-as.numeric(pchisq(2*(logLik(fit2)-logLik(fit1)),   df=attr(logLik(fit2),"df")-attr(logLik(fit1),"df"),   lower.tail=FALSE)) > P2vs1
[1] 2.658128e-08
> ## fit 3 rate model
> fit3<-rateshift(tree,x,nrates=3)
> fit3
ML 3-rate model:
      s^2(1)  se(1) s^2(2) se(2)  s^2(3)  se(3)  k  logL
value 1.7181  NaN  6.2815 1.8016  0.7716  0.1345 6  -64.017

Shift point(s) between regimes (height above root):
        1|2    se(1|2) 2|3    se(2|3)
value  0.3058  0.0265  0.8193  0.0141

R thinks it has found the ML solution.

> ## test 3 rates vs 2 rates
> P3vs2<-as.numeric(pchisq(2*(logLik(fit3)-logLik(fit2)),
  df=attr(logLik(fit3),"df")-attr(logLik(fit2),"df"),
  lower.tail=FALSE))
> P3vs2
[1] 0.6934852

This shows us that although the fitted shift points in our third fitted model are fairly close to the gnerating shift points, the fit isn't significantly better than our two rate model. I suspect that, in general, it will probably be easier to find shift points that are closer to the tips of the tree, where there tends to be more edges.

Cool.

4 comments:

  1. Hi Liam, this is really cool, thanks for developing the rateshift method.
    I've been testing it on geometric morphometric data of an adaptive radiation, for which DTTs indicate an early burst, and for which I therefore expected a rate shift somewhere close to the root. Instead, when using nrates=2, a shift very close to the present is detected, at the age of the youngest node in the phylogeny. I also tested nrates=1,3, and 4, and the resulting logLikelihoods are more negative for nrates=3 than for nrates=2, which should never be the case, right? Could it be that there's a problem with convergence?

    > rateshift(tree,X,nrates=1)$logL
    [1] -129.3469
    > rateshift(tree,X,nrates=2)$logL
    [1] -124.8993
    > rateshift(tree,X,nrates=3)$logL
    [1] -127.0162
    > rateshift(tree,X,nrates=4)$logL
    [1] -124.3177

    If you'd like to replicate these results, I could send you the tree and trait data in an email.

    Cheers,
    Michael

    ReplyDelete
    Replies
    1. Hi Michael.

      Does R think that the optimization has converged? This should be at the bottom of:

      fit<-rateshift(...)
      fit

      Please do send me the data & tree. I will try to figure this out.

      Thanks for the report.

      - Liam

      Delete
  2. Yes, R thinks so. I just send you tree and trait data in an email. Thanks for having a look at it.
    Michael

    ReplyDelete
    Replies
    1. Michael - I am now working on this. I expect to have a more robust optimization in the next CRAN phytools version. - Liam

      Delete