Sunday, April 24, 2011

Updates to evol.rate.mcmc() & new analysis of the posterior sample

I just posted a new version of evol.rate.mcmc(), our MCMC method for identifying the location of a shift in the evolutionary rate through time. I described this method in a previous post here and in an upcoming publication with several coauthors (1,2,3). In the original version of this function, I had some peculiar bookkeeping. For instance, I defined the derived rate as σ(1)2 if it was left of the root; and as σ(2)2 if right of the root. This was to try and avoid ending up with a posterior sample consisting of a mixture of the two rates. The biggest addition in the present version is a a post-hoc processing of the posterior sample which helps negate this problem entirely (and thus removes the necessity of "anchoring" σ(1)2 or σ(2)2 to the left or the right side of the tree, respectively). The method basically works as follows. First, I use my function min.split() to find the "median" shift point in the tree. Here, I define this as the posterior sample with the minimum summed patristic distance to all the other sampled points on the tree. Next, I use the function posterior.evolrate() which takes this average shift point and the raw posterior sample, and then returns for each posterior sample the mean rate upstream and downstream of the average shift. I did this as follows: 1) For sample i, I split the tree at shift point i; 2) I then scaled each side of the split tree by their respective rates; 3) Next, I reattached the two sides of the tree; 4) Finally, I split the tree at the mean shift point and computed the evolutionary rates upstream and downstream of this split. The figure below shows the tree from my simulated example dataset here, where the branches of the tree have been scaled by the estimated rates. Code for these analyses is on my R-phylogenetics page (direct link here). To run on the dataset from my previous post, please try the following code:
> require(ape)
> tree<-read.tree(text="((((30:0.014,29:0.014):0.171,(28:0.109,27:0.109):0.076):0.258,(26:0.275,25:0.275):0.168):0.557,(((((24:0.238,(23:0.139,22:0.139):0.099):0.199,((((21:0.009,20:0.009):0.045,19:0.054):0.08,(18:0.083,((17:0.043,16:0.043):0.008,15:0.051):0.032):0.051):0.007,14:0.141):0.297):0.095,((13:0.386,((12:0.089,11:0.089):0.216,10:0.306):0.08):0.035,(9:0.095,8:0.095):0.326):0.111):0.009,(7:0.159,(6:0.112,5:0.112):0.047):0.383):0.078,(4:0.535,((3:0.203,2:0.203):0.151,1:0.354):0.181):0.085):0.38);")
> x<-c(6.017,4.73,7.254,11.821,10.391,12.794,15.405,7.941,5.78, 12.71,11.199,10.748,11.927,9.615,9.248,8.973,9.055,9.095,9.892, 9.818,9.838,8.355,8.672,8.972,9.177,5.782,4.762,6.763,1.575,1)
> names(x)<-as.character(1:30)
> source("evol.rate.mcmc.R")
> result<-evol.rate.mcmc(tree,x,ngen=100000,control=list(sdlnr=2))
> minsplit<-min.split(tree,result$mcmc[201:1001,c("node","bp")])
> mcmc<-posterior.evolrate(tree,minsplit,result$mcmc[201:1001,], result$tips[201:1001])

To plot the tree with branch lengths rescaled to the estimated rate, try:

> ave.rates(tree,minsplit,extract.clade(tree, node=minsplit$node)$tip.label,colMeans(mcmc)["sig1"], colMeans(mcmc)["sig2"],minsplit)
which should return a plot much like the one below:


  1. Dan Scantlebury found a fatal error in v0.4 (actually, which prevented the function from loading into memory). I have fixed this, so hopefully it works now.

  2. min.split() has now been renamed minSplit() for generic method consistency (v0.0-5 of "phytools"). - Liam

  3. i've been going through this exercise and things are going great until i try to plot the ave.rates - which doesn't appear as a function i guess - have you updated this?

    1. This is an error in that ave.rates is (for some reason) not in the NAMESPACE of phytools. You can download the source code here, but I will also build a new version of phytools with this fixed & post shortly.

    2. Actually, revise that - it is not an error that ave.rates is not in the NAMESPACE; it is designed to be called internally by posterior.evolrate only. Here, I have just used it as a convenience to visualize the posterior mean rates as a tree stretching; however this is not its primary function.

  4. I am not sure if anyone else has encountered this error, but as I was trying to run posterior.evolrate I kept getting this message: "Error in if (where == 0 || where == "root") where <- ROOTx : missing value where TRUE/FALSE needed".

    I tracked the problem down to splitTree not always creating a branch labelled "NA" out of an internal branch (seems to be a issue with drop.tip in the drop.clade function). I think I have found a way around it, see below, posterior.evolrate now running properly for me.

    # first extract the clade given by shift$node
    #now remove tips in tr2 from tree
    ###edit starts here
    tr1$tip.label[(which(tr1$tip.label %in% drops))]<-"NA"
    ###edit ends here
    } else {
    # first extract the clade given by shift$node
    # now remove tip in tr2 from tree
    trees<-list(tr1,tr2); class(trees)<-"multiPhylo"

    1. Thanks, Jessica. I had the same problem and this solved it right away!

  5. I cannot make it work... I still get the this error message, even if I paste the function. But I'm puzzled as I can't see where "posterior.evolrate" calls this "splitTree" internally. Please, any help would be very appreciated!

    1. Hi. Not sure I understand the source of this error. Please feel free to email me your input data and run code ( & I will try and replicate the bug. Thanks! Liam