Friday, April 5, 2013

Bug fix for make.simmap with asymmetric substitution model; new version of phytools

Yesterday I received a user report of some problems with make.simmap(...,model="ARD") when it resulted in some of the fitted transition rates being zero. This was a known (to me) issue with make.simmap, and it is because although we can compute the conditional likelihoods with this matrix no problem - when we are trying to draw waiting times from an exponential distribution to map character changes along internal branches, rexp(...,rate=0) won't evaluate. One solution to this would be to return Inf or some arbitrarily large number when the rate is 0. Instead, and for other reasons of computation, I decide to add a small number, tol=1e-08 to off-diagonal position of Q that are 0 in the MLE. (There are also other calculations that make this necessary.)

Fixing this issue turned up another more serious problem and that is that recent versions of make.simmap have been using the transpose of Q in simulating along edges for asymmetric transition models, instead of Q itself (or, alternatively, that it has been calling a row index instead of a column index during an important stage in calculation). I believe that this bug appeared during my recent major re-write of make.simmap. Obviously, this doesn't affect symmetrical models of character change in which Q==t(Q) (such as model="ER" or model="SYM" - the default), but it will affect model="ARD".

Here's a little more specific detail on the error. In the internally called function smap, I had:

Q<-t(Q)
where I should not have; or, alternatively:
p<-expm(Q*tree$edge.length[j])[NN[j,1],]* L[as.character(tree$edge[j,2]),]
instead of:
p<-expm(Q*tree$edge.length[j])[,NN[j,1]]* L[as.character(tree$edge[j,2]),]

Source code for the fixed version of make.simmap is here. In this version, users can also control the value of tol by way of the optional argument, well, tol. tol is only used if any of the off-diagonal elements of Q are less than tol, which has a default value of tol=1e-08, as noted above.

This update to make.simmap is also in a new phytools package build, phytools 0.2-37, which can be installed from source.

Finally, here is a demo in which I simulate with a very low backward rate & then show what the new version of make.simmap does (instead of failing) if that backward transition rate has a MLE of 0. Note that if make.simmap seems to hang - it may be possible to resolve this by increasing tol.

> require(phytools)
Loading required package: phytools
> packageVersion("phytools")
[1] ‘0.2.37’
> tree<-pbtree(n=200,scale=1)
> Q<-matrix(c(-1,1,0.01,-0.01),2,2)
> rownames(Q)<-colnames(Q)<-letters[1:2]
> tree<-sim.history(tree,Q,anc="a")
> cols<-setNames(c("black","red"),letters[1:2])
> # this is the true character history
> plotSimmap(tree,cols,pts=F,ftype="off")
> mtrees<-make.simmap(tree,tree$states,model="ARD", nsim=100)

Warning: some elements of Q not numerically distinct from 0; setting to 1e-08

make.simmap is sampling character histories conditioned on the transition matrix
Q =
            a           b
a -0.96697341  0.96697341
b  0.00000001 -0.00000001
(estimated using likelihood);
and root node prior probabilities
pi =
  a   b
0.5 0.5

Done.

And a little reality check:

> # true history
> describe.simmap(tree)
1 tree with a mapped discrete character with states:
 a, b

tree has 26 changes between states

changes are of the following types:
  a  b
a 0 26
b 0  0

mean total time spent in each state is:
             a         b    total
raw  26.703708 18.438526 45.14223
prop  0.591546  0.408454  1.00000

> # stochastic maps
> describe.simmap(mtrees,plot=T,show.tip.label=FALSE)
100 trees with a mapped discrete character with states:
 a, b

trees have 25.48 changes between states on average

changes are of the following types:
       a,b b,a
x->y 25.48   0

mean total time spent in each state is:
              a          b    total
raw  26.2457674 18.8964671 45.14223
prop  0.5814016  0.4185984  1.00000

In the new phytools build I've also added the function getStates (which can be used to pull the states at nodes or tips from a tree with a mapped discrete character and is called internally by describe.simmap) to the NAMESPACE so that it can be called by phytools users.

Please don't hesitate to report any bugs or issues with the present version of make.simmap or phytools.

Thanks!

5 comments:

  1. Link to the source code was wrong in the first version of this post. Now fixed. Liam

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
    Replies
    1. Hi Heath. I'm guessing you removed your comment because you found this: http://blog.phytools.org/2013/04/small-bug-fix-in-makesimmap-for.html. Thanks for using phytools & let me know if you have any more questions. Liam

      Delete
    2. I installed the new version and tried it on a couple of trees and it worked great. It is now crunching through my full set of trees. Thanks!

      Delete