Tuesday, July 1, 2014

Bug fix in make.simmap for multi-state data and non-reversible model of evolution

In response to an R-sig-phylo query about stochastic mapping in phytools I wrote:

I can answer the three questions that pertain to phytools:

(1) The function make.simmap has a bug for non-reversible models when the input character has more than 2 states. This has to do with the algorithm for simulating changes along edges where the correct waiting time is simulated, but then the states at the end of the waiting time are chosen with the incorrect probabilities. This will sometimes cause make.simmap to hang for a long time. This does not affect the states simulated at nodes (which occurs first) and should not affect stochastic character mapping at all for binary characters or any reversible model of character evolution. I have posted a fixed version here - but you can also install a version of phytools containing this update here. Use phytools >= 0.4-22.

(2) You should look into the function describe.simmap to summarize the time spent in each state, etc., from stochastic character mapping using make.simmap..

(3) To simulate stochastic character histories you can use the phytools function sim.history. Note, though, that the matrix Q is the transpose of the fitted value of Q from make.simmap. Sorry about this. That means to simulate stochastic character histories on your tree you can do:

## stochastic mapping:
mtrees<-make.simmap(tree,x,model="ARD",nsim=100)
## summarize results
obj<-describe.simmap(mtrees)
obj
plot(obj) ## PP at nodes from stochastic mapping
fitted.Q<-mtrees[[1]]$Q
## simulate under fitted model
simhistories<-sim.history(tree,t(fitted.Q),nsim=100,anc=rstate(obj$ace[1,]))

The last part (anc=rstate(obj$ace[1,])) is necessary if you have non-reversibility, because this way you are picking from the posterior distribution at the root from your real data.

In addition, sim.history does not permit any columns (rows in the matrix from make.simmap) to be equal to 0.0 (which we will have if we have a truly non-reversible character). To resolve this, you can do something like:

ii<-which(diag(fitted.Q)==0)
fitted.Q[ii,-ii]<-max(nodeHeights(tree))*1e-12
fitted.Q[ii,ii]<--sum(fitted.Q[ii,])
simhistories<-sim.history(tree,t(fitted.Q),nsim=100,anc=rstate(obj$ace[1,]))

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.