Wednesday, July 13, 2011

Update to make.simmap() to generate multiple maps

Obviously, the purpose of stochastic character mapping is to sample possible character histories with according to their posterior probability (Huelsenbeck et al. 2003). With the previous version of make.simmap() it was only possible to do this by running the function many times with the same data and tree. This involved the totally unnecessary inefficiency of repeatedly calculating the conditional likelihoods using the "ape" function ace(). This is actually the slowest part of make.simmap() by a wide margin, e.g.:

> system.time(mtree<-make.simmap(tree,x))
   user  system elapsed 
   0.49    0.00    0.50 
> system.time(XX<-ace(x,tree,type="discrete",model="SYM"))
   user  system elapsed 
   0.32    0.00    0.32

Since these likelihoods are the same each time we run make.simmap() (that is, for a given tree, data vector, and model), we should not recalculate them for every stochastic map that we want to simulate. In the latest version of make.simmap(), which I have just posted online (here), we just calculate the conditional likelihoods once - and then we use them for each simulation on the tree. This prevents computation time from rising linearly with the number of simulated maps that we want. For instance, let's use "geiger" to simulate a tree and data; and then the new version of make.simmap() to simulate on the tree:

> require(geiger)
> tree<-drop.tip(birthdeath.tree(b=1,d=0,taxa.stop=101),"101")
> x<-sim.char(tree,model.matrix=list(matrix(c(-1,1,1,-1),2,2)), model="discrete")[,,1]
> system.time(mtree.single<-make.simmap(tree,x))
   user  system elapsed 
   0.52    0.00    0.52 
> system.time(mtree.multiple<-make.simmap(tree,x,nsim=100))
   user  system elapsed 
  19.69    0.00   19.69

So, not super fast, but better (by 30s over 100 replicates) than had we just looped the old version of make.simmap() 100 times.

1 comment:

  1. P.S. The new version of make.simmap() returns a modified "multiPhylo" object (assuming nsim>1). Unfortunately, plotSimmap() will not plot these trees, unless called on the individual elements of the "multiPhylo" object.
    > mtrees<-make.simmap(tree,x,nsim=2)
    > plotSimmap(mtrees[[1]])
    > plotSimmap(mtrees[[2]])