Wednesday, June 22, 2011

Stochastic character mapping on the tree

I'm just now returning from the 'Evolution' meeting (joint meeting of SSE, ASN, and SSB) in Norman, Oklahoma. I saw many good and exciting talks at the meeting. Hopefully I will be able to channel some of this excitement into new functions and methods for my nascent {phytools} package in the coming weeks.

Rich Fitzjohn, a Ph.D student with Sally Otto at UBC, gave an excellent talk on fitting and using state-dependent diversification models; however, he also began his talk with a very clear and simple explanation of the algorithm for stochastic character mapping, originally developed by Nielsen (2002) and Huelsenbeck et al. (2003). A couple of my functions (in particular, brownie.lite(), which implements the method of O'Meara et al. (2006); and evol.vcv(), which implements Revell & Collar (2009)) take stochastic map trees created by my function read.simmap() as input.

As it turns out, Rich has already programmed stochastic character mapping in his impressive package {diversitree} that I am just beginning to explore; however after Rich's explanation the exercise seemed to be so straightforward that I thought I would try it myself. This way it will be easy for users to seamless go from discrete and continuous character vectors to the rate variation analyses of brownie.lite() and evol.vcv().

To conduct stochastic character mapping we first need the conditional likelihoods of each of our states for our mapped discrete character at each node in the tree. Making the task even easier, it is my understanding that these are returned as the object $lik.anc by the {ape} function ace(...,type="discrete"). [Note, this is not well documented - so I am still trying to confirm that these likelihoods are indeed the "right" ones.] Once we have these, we just traverse the tree once, from the root to the tips. At the root, we just assign the root state with probability determined by the relative conditional likelihoods at each root. Then, for each node, we compute the element-wise product of the vector of conditional likelihoods for that node and the vector Pi·, where P for a branch length t is given by: eQ*t; and then we stochastically assign internal nodes using the relative probabilities we have computed. (Here, eX denotes the matrix exponential of X.)

After we have obtained a set of probable states for all the nodes of the tree, we then generate plausible character histories along each edge. We do this by generating waiting times for character shifts from an exponential distribution with rate -Qii for starting state i. We do this along each edge of the tree, rejecting branches that don't start and finish with the states simulated in the previous step of our analysis.

I have just posted a beta version of this function to my R phylogenetics page (direct link here). Assuming that there doesn't prove to be any major bugs in the function, I will also incorporate into the next version of {phytools}.

To use the function, simply load the source, and then type:

> mtree<-make.simmap(tree,x)

for "phylo" object tree and discrete character vector x then cross your fingers.* [*This is not a required step, but it can never hurt.] Right now the function fits only a general time-reversible model (i.e., symmetric Q, all rates different); but in future I will add additional options.

3 comments:

  1. Hi Liam -

    This is a great new tool. Could you highlight how your stochastic mapping function compares to SIMMAP?

    ReplyDelete
  2. Hi. I guess I would say a few things in response to this:
    1) I do not have a Mac and I have not used SIMMAP v1.5 at all - I have used SIMMAP v1.0 a couple of times;
    2) My impression is that SIMMAP is a very sophisticated program with an impressive GUI that can do many different analyses in addition to stochastic mapping; finally,
    3) My intention in writing make.simmap() was to implement the basic stochastic mapping method of Huelsenbeck et al. (2003). This is the same basic method implemented in SIMMAP; with the limitation that I only used one substitution model - the symmetric model (SYM). This I will probably extend in future versions. I would also note that since SIMMAP is not open source it is difficult for me to examine whether or not we have implemented the same model - however, both programs are based on the same article (i.e., Huelsenbeck et al. 2003).

    Thanks for the question!

    ReplyDelete
  3. Hallo Liam!

    I'm new to the stochastic character mapping and I want to use it in combination with the mvOU function in mvMORPH. There it seems that only one character history is needed. As far as I understand (and that might be wrong), make.simmap gives me as many different character maps as nsim, all of them equally likely, which combined yield the probability of ancestral character state at each node??? But how can I implement those in the mvOU function?

    All the Best,
    Jan

    ReplyDelete