Saturday, May 4, 2013

Huge speed-up for rerootingMethod & estDiversity

A couple of months ago I posted an extremely simple function - basically a wrapper for the 'ape' function ace - to do marginal ancestral state reconstruction using the re-rooting method of Yang.

Well - this function is quite slow. This is partly because the tree has to be re-rooted at every internal node; but mostly this is slow for a totally unnecessary reason, and that is that by wrapping around ace (or, rather, a very lightly modified version of ace used internally by phytools), at each re-rooting the function also re-estimates the transition matrix, Q. Obviously - since only symmetric transition matrices are permitted by this method - this is totally unnecessary.

This is now fixed in the latest minor phytools build (phytools 0.2-56) and the result is an enormous speed-up in computation time. So, for instance:

> require(phytools)
Loading required package: phytools
> packageVersion("phytools")
[1] ‘0.2.55’
>
> # simulate
> tree<-pbtree(n=100,scale=1)
> Q<-matrix(c(-2,1,1,1,-2,1,1,1,-2),3,3)
> rownames(Q)<-colnames(Q)<-letters[1:3]
> x<-sim.history(tree,Q)$states
>
> # ok, now estimate using the old version
> system.time(XX<-rerootingMethod(tree,x))
  user  system elapsed
  28.34    0.00  28.42
> # unload phytools and install the new version
> detach("package:phytools",unload=TRUE)
> install.packages("phytools_0.2-56.tar.gz",type="source", repos=NULL)
* installing *source* package 'phytools' ...
** R
...
* DONE (phytools)
> require(phytools)
Loading required package: phytools
> packageVersion("phytools")
[1] ‘0.2.56’
>
> # ok, now repeate the analysis using the new version
> system.time(YY<-rerootingMethod(tree,x))
  user  system elapsed
  3.36    0.01    3.38
> plot(XX$marginal.anc,YY$marginal.anc,xlab="marginal ASRs old version",ylab="marginal ASRs new version")

Cool.

The same speed-up can also be applied to estDiversity - which estimates historical lineage diversity at all the nodes of the tree based on the approach of Mahler et al. (2010) (e.g., 1, 2). So, for instance:

> require(phytools)
Loading required package: phytools
> packageVersion("phytools")
[1] ‘0.2.55’
>
> system.time(d.old<-estDiversity(tree,x))
Please wait. . . . Warning - this may take a while!
Completed 10 nodes
Completed 20 nodes
Completed 30 nodes
Completed 40 nodes
  user  system elapsed
 228.64    0.14  236.39
>
> detach("package:phytools",unload=TRUE)
> install.packages("phytools_0.2-56.tar.gz",type="source", repos=NULL)
...
* DONE (phytools)
> require(phytools)
Loading required package: phytools
> packageVersion("phytools")
[1] ‘0.2.56’
>
> system.time(d.new<-estDiversity(tree,x))
Please wait. . . . Warning - this may take a while!
Completed 10 nodes
Completed 20 nodes
Completed 30 nodes
Completed 40 nodes
  user  system elapsed
  26.02    0.04  26.41
> plot(d.old,d.new,xlab="estimated historical diversity (old)",ylab="estimated historical diversity (new)")

Wow. That's an enormous difference. Cool.

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.