Tuesday, April 2, 2013

Using make.simmap on a set of trees

A recent commenter asked:

"I wonder if it would be possible to apply make.simmap to an object multiphylo (to deal with phylogenetic uncertainty) and to summarize the outcome on a consensus tree."

Let's take this one bit at a time. First, the task of applying make.simmap, the phytools function for stochastic character mapping, to a set of trees - say a sample from the posterior distribution in a Bayesian analysis.

At present, make.simmap takes a single tree and data vector as input; and can return as many simulated stochastic maps as the user demands. It is possible to iterate over a list of trees and then combine the results into a single object of class "multiPhylo" - but this is a little annoying. This is because make.simmap(...,nsim>1) returns a list of trees; and thus lapply(trees,make.simmap,...,nsim>1) returns a list of lists. Various attempts to first unlist and then relist left me more & more annoyed - but the following hack seems to do the trick:


mtrees<-unlist(sapply(trees,ff,x,simplify=FALSE), recursive=FALSE)

I have now added this to the latest version of make.simmap, & also built a new version of phytools (phytools 0.2-36), which can be downloaded & installed from source.

Instead of only taking a single tree as input, this tree can take a list of trees (an object of class "multiPhylo") & will automatically generate nsim stochastic character maps per input tree.

OK, here's a demo of the new version using a set of 15,001 trees from the posterior distribution of a real Bayesian run (thanks Graham Reynolds), and a simulated binary character with states a and b.

> packageVersion("phytools")
[1] ‘0.2.36’
> trees<-read.nexus("posterior.sample.trees")
> trees
15001 phylogenetic trees
> # too many, let's randomly subsample
> trees
100 phylogenetic trees
> # ok, now generate 10 stochastic maps for each tree
> mtrees<-make.simmap(trees,x,nsim=10,message=FALSE)
> mtrees
1000 phylogenetic trees
> # now let's visualize the variability
> # again, 1000 is too many
> layout(matrix(1:100,10,10,byrow=TRUE))
> cols<-setNames(c("blue","red"),letters[1:2])
> plotSimmap(mtrees[0:99*10+1],cols,pts=F,ftype="off")
Waiting to confirm page change...

Cool. Now let's try describe.simmap:

> XX<-describe.simmap(mtrees)
1000 trees with a mapped discrete character with states:
 a, b

trees have 15.509 changes between states on average

changes are of the following types:
       a,b   b,a
x->y 9.078 6.431

mean total time spent in each state is:
               a           b    total
raw  176.5796569 145.4146415 321.9943
prop   0.5482752   0.4517248   1.0000

The times & state changes computed by describe.simmap will be correct - however the posterior probabilities for ancestral nodes (here, XX$ace) will not because different trees in the posterior sample have different nodes & node numbers.

Nonetheless, cool!


  1. Hello Liam,

    Great, you made it so fast!
    I wonder if it would be possible to use a consensus tree as a reference (eg, maximum credibility tree from BEAST), and to store the posterior probabilities for ancestral nodes, each time that a node from the consensus tree is found in the posterior sample. And to display the summary of xx$ace on the consensus tree.

    I don't know if I was clear...


  2. Hi Revell,
    Here's the R script that i have tried to execute,
    It contains a a multiphylo class (1000 trees) as input


    It demands a phylo class tree.
    Iam running on version 0.2-58. Has this option been removed?

    Thanks a ton for your posts.

    1. You might want to update to the
      latest non-CRAN release
      , just to be sure; however this should work with no problem.

      Failing that - check the class of your object:


      should be "multiPhylo".

      If you are not able to get it to run - please feel free to send it to me & I will troubleshoot. It is good for me to make sure that there are no problems with phytools.

      Thanks. Liam

    2. Thanks Liam, I have checked for the class, it was multiphylo. But, updating to 0.2-63 seems to have rectified the issue. It's running now. Assuming should be smooth. In the mean time managed to run the analyses successfully using your hack above. Will send you the files in case of further trouble. VJ

  3. Hey Liam,

    Thanks for make.simmap!

    Is it however also possible to implement a character and model it in a 'meristic' way, even if a certain state is missing among extant taxa. So say, I provide the trees and the tip states vector that contain values of 0,1,3,4 and 5. A state of 2 is however not included but assumingly a biologically relevant intermediate step to evolve 'meristically' in amidst 1 and 3. Is there a way to circumvent make.simmap to not only discretely move along the given tip values?