I just posted a slightly faster version of multiRF that takes advantage of the fact that the first element in the list of tree bipartitions from prop.part of the ape package is the "partition" with all the taxa in the tree - which can be ignored if our two input trees have the same taxa (they should). The result is about a 20% speed-up in the 100 × 99 ÷ 2 comparisons for 100, 100-taxon trees.
Here's a demo using my i3 Asus Vivobook (obviously - it would be much faster on a speedy desktop):
> library(phytools)
> library(phangorn)
> packageVersion("phytools")
[1] ‘0.3.53’
> ## circumvents rNNI from phangorn which is buggy
> ## in the current version
> randomNNI<-function(tree,moves=1){
for(i in 1:moves) tree<-sample(nni(tree),1)[[1]]
tree
}
> ## random tree
> tree<-rtree(n=100,rooted=FALSE,br=NULL)
> ## 100 trees random NNIs
> trees<-lapply(round(runif(n=100,min=1,max=100)), function(x,y) randomNNI(y,x),y=tree)
> class(trees)<-"multiPhylo"
> ## this is a hack
> trees<-read.tree(text=write.tree(trees))
> ## compute all pairwise RF
> system.time(D1<-multiRF(trees))
user system elapsed
16.41 0.02 16.58
> ## now compare to new version
> detach("package:phytools",unload=TRUE)
> install.packages("phytools_0.3-60.zip",repos=NULL)
package ‘phytools’ successfully unpacked and MD5 sums checked
> library(phytools)
> packageVersion("phytools")
[1] ‘0.3.60’
> system.time(D2<-multiRF(trees))
user system elapsed
13.28 0.01 13.29
> plot(D1,D2)
(Just to verify that it still works!)
> library(phangorn)
> packageVersion("phytools")
[1] ‘0.3.53’
> ## circumvents rNNI from phangorn which is buggy
> ## in the current version
> randomNNI<-function(tree,moves=1){
for(i in 1:moves) tree<-sample(nni(tree),1)[[1]]
tree
}
> ## random tree
> tree<-rtree(n=100,rooted=FALSE,br=NULL)
> ## 100 trees random NNIs
> trees<-lapply(round(runif(n=100,min=1,max=100)), function(x,y) randomNNI(y,x),y=tree)
> class(trees)<-"multiPhylo"
> ## this is a hack
> trees<-read.tree(text=write.tree(trees))
> ## compute all pairwise RF
> system.time(D1<-multiRF(trees))
user system elapsed
16.41 0.02 16.58
> ## now compare to new version
> detach("package:phytools",unload=TRUE)
> install.packages("phytools_0.3-60.zip",repos=NULL)
package ‘phytools’ successfully unpacked and MD5 sums checked
> library(phytools)
> packageVersion("phytools")
[1] ‘0.3.60’
> system.time(D2<-multiRF(trees))
user system elapsed
13.28 0.01 13.29
> plot(D1,D2)
BTW, I just submitted phytools 0.3-60 to CRAN - so hopeful it is accepted & percolates through the mirror repositories this week.
Hi Liam,
ReplyDeleteI uploaded a new version of phangorn. I upgraded the function to compute the Robinson-Foulds distance RF.dist to also compute pairwise distances for multiPhylo objects much like the multiRF in phytools.
I added few twists to it so it is quite fast now, similar as https://peerj.com/articles/187/.
I also fixed a bug in rNNI and allowed rNNI that the moves argument can be a vector.
> library(phytools)
> library(phangorn)
> ## random tree
> set.seed(1)
> tree<-rtree(n=100,rooted=FALSE,br=NULL)
> ## 100 trees random NNIs
> trees <- rNNI(tree, moves=sample(10,size=100, replace=TRUE))
> trees2 <- .uncompressTipLabel(trees)
>
> system.time(D1<-multiRF(trees2))
user system elapsed
23.428 0.000 23.391
> system.time(D2<-RF.dist(trees))
user system elapsed
0.288 0.000 0.291
> sum(abs(D1 - as.matrix(D2)))
[1] 0
I realised there is a problem with multiRF. multiRF does not take a species as reference to compute the bipartitions, this can be a problem when for example you reroot the tree.
> set.seed(1)
> trees3 <- lapply(1:5, function(x,tree)root(tree,x), tree)
> trees3 <- lapply(trees3, reorder, "postorder")
> class(trees3) = "multiPhylo"
> (D3 <- multiRF(trees3))
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 2 12 14
[2,] 0 0 2 12 14
[3,] 2 2 0 10 12
[4,] 12 12 10 0 2
[5,] 14 14 12 2 0
> D4 <- RF.dist(trees3)
> as.matrix(D4)
1 2 3 4 5
1 0 0 0 0 0
2 0 0 0 0 0
3 0 0 0 0 0
4 0 0 0 0 0
5 0 0 0 0 0
For rooted trees such an effect may be desirable, as it shows that the root of two trees differs.
Regards,
Klaus