Yesterday, Klaus Schliep pointed out that a trick to speed up lapply's handling of objects of class "multiPhylo" (i.e., just a simple list of objects of class "phylo") was just to first remove the class attribute "multiPhylo". Why this would have any effect at all is somewhat of a mystery to me. is.list(trees) evaluates TRUE regardless of whether or not the class attribute has been removed, so it doesn't seem that lapply would have to coerce our object to a list in either case. Nonetheless - not only does this work, it works tremendously! I have now included this simple trick in countSimmap, describe.simmap, and getStates, all of which use lapply or sapply if the argument tree is an object of class "multiPhylo".
Here's a demo of just how much of an improvement in speed results from this trick:
Loading required package: phytools
> packageVersion("phytools")
[1] ‘0.2.56’
>
> # simulate data
> tree<-pbtree(n=100,scale=1)
> Q<-matrix(c(-1,1,1,-1),2,2)
> colnames(Q)<-rownames(Q)<-letters[1:2]
> tree<-sim.history(tree,Q)
> x<-tree$states
>
> # stochastic mapping
> mtrees<-make.simmap(tree,x,nsim=1000)
make.simmap is sampling character histories conditioned on the transition matrix
Q =
a b
a -0.8686634 0.8686634
b 0.8686634 -0.8686634
(estimated using likelihood);
and (mean) root node prior probabilities
pi =
a b
0.5 0.5
Done.
>
> # ok, now let's time describe.simmap for
> # various subsets of our mapped trees
> system.time(X100<-describe.simmap(mtrees[1:100], message=FALSE))
user system elapsed
2.69 0.02 2.70
> system.time(X200<-describe.simmap(mtrees[1:200], message=FALSE))
user system elapsed
12.71 0.02 12.75
> system.time(X400<-describe.simmap(mtrees[1:400], message=FALSE))
user system elapsed
74.24 0.64 75.57
Woah. I'm not even going to try the full set of 1,000 trees.
OK, now let's compare to describe.simmap with nothing more than the trick suggested by Klaus (i.e., unclass-ing the "multiPhylo" object for every use of lapply or sapply):
Loading required package: phytools
> packageVersion("phytools")
[1] ‘0.2.58’
> system.time(Y100<-describe.simmap(mtrees[1:100], message=FALSE))
user system elapsed
0.45 0.00 0.45
> system.time(Y200<-describe.simmap(mtrees[1:200], message=FALSE))
user system elapsed
0.83 0.00 0.82
> system.time(Y400<-describe.simmap(mtrees[1:400], message=FALSE))
user system elapsed
1.78 0.00 1.78
> system.time(YY<-describe.simmap(mtrees,plot=TRUE))
1000 trees with a mapped discrete character with states:
a, b
trees have 29.633 changes between states on average
changes are of the following types:
a,b b,a
x->y 10.692 18.941
mean total time spent in each state is:
a b total
raw 12.9314025 18.3673136 31.29872
prop 0.4131608 0.5868392 1.00000
user system elapsed
4.65 0.06 4.71
Wow.
I should have noted that this is in a new minor release of phytools (phytools 0.2-58). - Liam
ReplyDelete