Friday, November 25, 2011

Major speed up of anc.Bayes()

I just realized how to score a major speed-up in computation time from my new Bayesian ancestral character estimation function for continuous traits (anc.Bayes). Last night I documented what I thought at the time was a significant speed improvement. By pulling inversion of the phylogenetic VCV matrix out of the likelihood function (meaning that we only have to do this once instead of every generation of the MCMC), I was able to cut computation time in half (from about 60s on a 100 taxon tree to about 30s). What I didn't realize at the time was that I could also pull out calculation of the determinant of the matrix. The determinant we want is log(|σ2C|); the way we avoid recalculating this every generation of the MCMC (in which σ2 can potentially differ) is by taking advantage of the equality:

log(|σ2C|) = nlog(σ2) + log(|C|)

I actually described this in an earlier paper with Luke Harmon (Revell & Harmon 2008, p. 318), but did not think to apply it to this problem until now. [Note that there is actually a small error in equation (5) of our article: n·r·log(k) should be n·r·log(k)/2.]

This version runs much faster: now about 4s for the same dataset as before. Unbelievable! Anyway, I have posted the new version on my website here.

This version is so much faster that I can easily run it for a million generations on the 100 species tree from before. If I plot the estimates from the Bayesian MCMC run with an uninformative prior against the values from ML estimation using ace() in the ape package, we can see that they are even more tightly correlated than before:

> source("anc.Bayes.R")
> source("vcvPhylo.R")
> X<-anc.Bayes(tree,x,ngen=1000000)
> plot(ace(x,tree)$ace,colMeans(X[21:10001,])[2+1:tree$Nnode])



(compare to here). The above code took about 6 minutes to run, whereas the first version of anc.Bayes would've needed about 100 minutes for the same calculations.

I haven't mentioned MCMC diagnostics, priors, or the utility of Bayesian inference at all yet. Hopefully, these will be addressed in future posts.

5 comments:

  1. Dear Liam,
    Works well and will definitely be of use to estimate ancestral states. Do you plan to implement the possibility to define variance (e.g. as standard error) for the extant taxa in a future version?
    Olivier

    ReplyDelete
  2. Hi Olivier. Yes, this is in the works. Thanks for asking & for your comment. - Liam

    ReplyDelete
  3. Dear Liam,

    First of all let me say thank you for all your hard work and efforts put into phytools, it's full with useful and fast algorithms, which are immensely valuable!

    I've used your package and anc.Bayes () to infer ancestral states for a tree with 820 tips, 819 internal nodes. The trait is discrete, basically it can take 14 states: 1-14.

    I set the ngen to a million and it provided reasonable ancestral states, but it ran for 23 hours.. Is this normal regarding the above mentioned scenario or a bit slow?

    My other question: is there a way to get transition parameters for these states as well? I'd need to infer the states of other species (which are not in this tree, since we don't know their states yet) using the branch lengths, nearest ancestral state and the transition probabilities however I could not find these probabilities in X after running the MCMC..

    Thank you very much for your answers in advance!

    Daniel

    ReplyDelete
  4. Unfortunately (!) anc.Bayes only does Bayesian ancestral character estimation for continuous characters. The closest thing to Bayesian ancestral state estimation in 'phytools' is make.simmap, which does stochastic character mapping - but this only samples character history from a conditional posterior distribution, conditioned on the MLE of Q. Sorry!

    I'm also not aware of a function that does Bayesian ancestral state estimation for discrete characters in R.

    All the best, Liam

    ReplyDelete
    Replies
    1. Thanks a lot for the prompt answer Liam! The trait I'm using / for which I'm trying to infer ancestral states is quite a tricky one and after careful consideration we (with one of my teachers) decided to try and treat it as continuous. Would the anc.Bayes() function somehow provide rate parameters for this trait? I would think of the probability of increase and decrease of the trait's value as two justifiable parameters.. Would it be possible to get estimates of these using anc.Bayes?

      Sorry if I'm asking something trivial!

      Daniel

      Delete