Stochastic character mapping, originally due to Nielsen (
2001,
2002) and Huelsenbeck et al. (
2003) is a procedure whereby mutations or character substitutions are mapped onto the nodes and branches of a phylogenetic tree. Maps are generated stochastically, but with probability equal to their posterior probability given our transition model. Sampling stochastic character maps, then, may be a good way to capture information about both the states at internal positions in the tree, as well as the timing of character substitution through the phylogeny. As noted in an earlier post, the background of this blog are visualized stochastic character maps for "ecomorph" mapped on the Caribbean
Anolis tree (from Mahler et al.
2010).
The program
SIMMAP for stochastic mutational mapping was developed by Jonathan Bollback. It performs stochastic mutation mapping, and outputs stochastically mapped trees to file using a modified Newick format. Since stochastic character maps can be very useful for certain kinds of downstream phylogenetic analysis, I was very interested in writing a function that would read these modified Newick trees into R.
The basic format of stochastic character mapped phylogenies in v1.0 of SIMMAP is as follows:
((A:{0,0.3:1,0.4:0,0.3},B:{0,1.0}):{0,1.0},(C:{0,0.25:1,0.75},D:{0,1.0}):{0,1.0});
This is comparable to the standard Newick style tree except that branch lengths have been replaced by { curly brackets } according to the following convention: { state
i, time spent in state
i : state
j, time spent in state
j :
etc. }.
I'm happy to announce that I have now developed usable code to read SIMMAP v1.0 trees in R. This function,
read.simmap(), accepts either or Phylip or Nexus style input files (the latter courtesy in large part to code borrowed from the "ape" function
read.tree()). My function creates a modified "phylo" object, with the additional element
$mapped.edge which is a matrix containing the time spent in each state (columns) on each edge (rows) of the tree. The source code for this function is available as a beta test version from my R phylogenetics page,
here, along with some very basic description of use.
Unfortunately, this announcement comes with a hitch. In the most recent release of SIMMAP (v1.5) Bollback has updated the SIMMAP tree format so that the tree above would be represented as follows:
((A:[&map={0,0.3,1,0.4,0,0.3}]1.0,B:[&map={0}]1.0):[&map={0}]1.0,(C:[&map={0,0.25,1,0.75}]1.0,D:[&map={0}]1.0):[&map={0}]1.0);
The reason for this, I surmise, is that now these trees conform to the Newick standard (in which, by convention, text inside [ hard brackets ] is ignored) and thus can be read by programs not specifically designed to read SIMMAP style trees. Unfortunately, Bollback does not provide any means of converting between old and new formatted trees.
I'm working on reading the new SIMMAP format into R as an option in my
read.simmap() function. I will keep readers of this blog up to date on my progress on this problem. I will also work on adding the capacity to retain the order of states on each edge, as the present version of my function destroys this information during read-in (retaining only the total time spent in each state on each edge).
Please feel free to test my function with SIMMAP v1.o formatted trees and let me know what you find.