Thursday, September 26, 2013

Slightly faster multiple pairwise RF distance

I just posted a slightly faster version of multiRF that takes advantage of the fact that the first element in the list of tree bipartitions from prop.part of the ape package is the "partition" with all the taxa in the tree - which can be ignored if our two input trees have the same taxa (they should). The result is about a 20% speed-up in the 100 × 99 ÷ 2 comparisons for 100, 100-taxon trees.

Here's a demo using my i3 Asus Vivobook (obviously - it would be much faster on a speedy desktop):

> library(phytools)
> library(phangorn)
> packageVersion("phytools")
[1] ‘0.3.53’
> ## circumvents rNNI from phangorn which is buggy
> ## in the current version
> randomNNI<-function(tree,moves=1){
  for(i in 1:moves) tree<-sample(nni(tree),1)[[1]]
  tree
 }
> ## random tree
> tree<-rtree(n=100,rooted=FALSE,br=NULL)
> ## 100 trees random NNIs
> trees<-lapply(round(runif(n=100,min=1,max=100)), function(x,y) randomNNI(y,x),y=tree)
> class(trees)<-"multiPhylo"
> ## this is a hack
> trees<-read.tree(text=write.tree(trees))
> ## compute all pairwise RF
> system.time(D1<-multiRF(trees))
   user  system elapsed
  16.41    0.02   16.58
> ## now compare to new version
> detach("package:phytools",unload=TRUE)
> install.packages("phytools_0.3-60.zip",repos=NULL)
package ‘phytools’ successfully unpacked and MD5 sums checked
> library(phytools)
> packageVersion("phytools")
[1] ‘0.3.60’
> system.time(D2<-multiRF(trees))
   user  system elapsed
  13.28    0.01   13.29
> plot(D1,D2)
(Just to verify that it still works!)

BTW, I just submitted phytools 0.3-60 to CRAN - so hopeful it is accepted & percolates through the mirror repositories this week.

1 comment:

  1. Hi Liam,

    I uploaded a new version of phangorn. I upgraded the function to compute the Robinson-Foulds distance RF.dist to also compute pairwise distances for multiPhylo objects much like the multiRF in phytools.
    I added few twists to it so it is quite fast now, similar as https://peerj.com/articles/187/.
    I also fixed a bug in rNNI and allowed rNNI that the moves argument can be a vector.

    > library(phytools)
    > library(phangorn)
    > ## random tree
    > set.seed(1)
    > tree<-rtree(n=100,rooted=FALSE,br=NULL)
    > ## 100 trees random NNIs
    > trees <- rNNI(tree, moves=sample(10,size=100, replace=TRUE))
    > trees2 <- .uncompressTipLabel(trees)
    >
    > system.time(D1<-multiRF(trees2))
    user system elapsed
    23.428 0.000 23.391
    > system.time(D2<-RF.dist(trees))
    user system elapsed
    0.288 0.000 0.291
    > sum(abs(D1 - as.matrix(D2)))
    [1] 0

    I realised there is a problem with multiRF. multiRF does not take a species as reference to compute the bipartitions, this can be a problem when for example you reroot the tree.

    > set.seed(1)
    > trees3 <- lapply(1:5, function(x,tree)root(tree,x), tree)
    > trees3 <- lapply(trees3, reorder, "postorder")
    > class(trees3) = "multiPhylo"
    > (D3 <- multiRF(trees3))
    [,1] [,2] [,3] [,4] [,5]
    [1,] 0 0 2 12 14
    [2,] 0 0 2 12 14
    [3,] 2 2 0 10 12
    [4,] 12 12 10 0 2
    [5,] 14 14 12 2 0
    > D4 <- RF.dist(trees3)
    > as.matrix(D4)
    1 2 3 4 5
    1 0 0 0 0 0
    2 0 0 0 0 0
    3 0 0 0 0 0
    4 0 0 0 0 0
    5 0 0 0 0 0

    For rooted trees such an effect may be desirable, as it shows that the root of two trees differs.

    Regards,
    Klaus

    ReplyDelete

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.