Thursday, May 30, 2013

Bug fix in make.simmap(...,Q="mcmc")

I just posted a new version of make.simmap and a new phytools build (phytools 0.2-77). This version fixes a bug affecting make.simmap(...,Q="mcmc"). In this method, the transition matrix Q is sampled from its Bayesian posterior probability distribution using MCMC given the model & data. This sample is then used by make.simmap to map characters on the tree.

The bug was not in the MCMC itself which (so far as I can tell) is properly designed, but in how Q was stored in sampling generations - specifically, the updated value of Q for that generation was always stored. The reason this is not a bug in the MCMC is because this value was only returned to the chain with probability equal to the posterior odds ratio - thus this is only about the value that is stored from the chain, not the behavior of the chain itself. This was somewhat difficult to detect because it will not be obvious unless the variance on the proposal distribution is high relative to the curvature of the likelihood surface or the prior density. (In case it's not obvious why this is, this is because if the proposal variance is low - most post burn-in samples will have relatively high posterior odds and will thus have a good chance of being accepted; whereas if the proposal variance is high, most samples will have low posterior odds.)

I also changed the starting value of Q for the MCMC. Previously, I had arbitrarily set all the non-diagonal elements of Q to a fixed constant. Now I draw a set of values at random from the prior probability density on Q, as provided by the user. The advantage of this is because if we set a very strong prior on Q, our MCMC may have difficulty converging on the region of high posterior density if the variance on our proposal distribution is too low or (especially) high.

I'm not sure what a good proposal variance is - but one way of thinking about it is relative to the empirical Q. For instance, if the non-diagonal of our empirical Q are all around ~0.1, then it is probably not a good idea to vQ = 10. Unless our data contain very little information about Q, almost all samples will be rejected and the MCMC will be very inefficient at exploring the posterior distribution of Q. Conversely, if the non-diagonal of our empirical Q average > 100, then we should probably not choose vQ = 0.001. In this case, if we start anywhere near the ML of Q - and unless we have very little information about Q in our data - almost all samples will be accepted, which is also a bad way to sample from the posterior using MCMC.

Even though make.simmap is not set up for this, it is possible to do some diagnoses on our MCMC using the MCMC diagnostics package coda. For example, let's say we have obtained 100 samples of Q (and thus 100 stochastic mapped trees) from the posterior after burnin

mtrees<-make.simmap(tree,x,Q="mcmc",vQ=0.01,prior= list(use.empirical=TRUE,beta=2))
we can get the likelihoods using
logL<-sapply(unclass(mtrees),function(x) x$logL)
or (for instance), the posterior sample of Q1,2 using
q12<-sapply(unclass(mtrees),function(x) x$Q[1,2])
and then perform diagnostics (effective size, rejection rate, etc.) using the appropriate coda functions. To increase effective size without increasing the number of sampled trees, we can increase the sample frequency (samplefreq) and increase or decrease the proposal variance (vQ).

6 comments:

  1. BTW - thanks to Rich Glor for reporting suspicious results that led to the discovery of this bug. - Liam

    ReplyDelete
  2. Hi Liam,

    Do you know of any work/literature on choosing a prior distribution for *Q* ?

    Best,

    Luiz

    ReplyDelete
    Replies
    1. No - but the prior really matters. For instance, if a strong prior with a mean much lower than the empirical Q is (inadvertently or intentionally) used, this will cause far too few changes in the stochastic reconstructions. Conversely, if a prior with a mean that is too high is used, this will increase the inferred number of changes on the tree.

      There may be good & justifiable reasons for using a strong prior - but this is seldom justified or discussed.

      One possibility is empirically parameterizing the prior density. This is done with make.simmap(...,Q="mcmc",prior=list(use.empirical=TRUE,...)). In this case alpha of the gamma prior is set to beta*empirical(Q). This should, I believe, have the effect of guaranteeing that the prior on Q should not bias our result systematically upward or downward.

      This topic is something that a colleague (Dave Collar) and I are talking about describing, but a manuscript is some ways off.

      - Liam

      Delete
    2. Users who publish results of stochastic mapping in SIMMAP v1.5 tend to use this program's branch length prior option, which eliminates the gamma distribution from consideration (when SIMMAP 1.5 uses the gamma prior, a random draw from this gamma distribution is used as multiplier of the branch lengths on the un-scaled or re-scaled tree"). In my experience, SIMMAP's branch length prior option tends to result in something approximating the parsimony solution, but I'm not sure if this is due to the fact that the prior is inappropriate (it's a bit hard to tell because SIMMAP 1.5 doesn't seem to provide likelihood scores for alternative models). Is it possible to run make.simmap using setting that match SIMMAP 1.5's branch length prior option? I think the two programs might use priors somewhat differently, but I haven't fully explored this issue.

      Delete
    3. Rich.

      So - SIMMAP is just sampling rates from the gamma? I was wondering about that. That's what the documentation seems to suggest, but I find it somewhat hard to believe.

      We had an empirical dataset in which SIMMAP under the defaults sampled histories with far fewer changes than you'd expect given the empirical (i.e., ML) value of Q. I found that we could emulate the SIMMAP results by using a strong prior on Q to be ~1/4 its empirical value. This was, of course, idiosyncratic to this particular tree & dataset. In simulation (i.e., using a known true stochastic history) I have seen results in both directions using SIMMAP (i.e., both too many & too few changes).

      - Liam

      Delete

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.