## Tuesday, April 9, 2013

### Estimating ancestral states when species values are uncertain or unknown, part II

Last month I described a method whereby stochastic mapping can be used to estimate ancestral states when tip states are uncertain or unknown. An ancillary benefit of this approach (now implemented in the 'phytools' function make.simmap) is that it can also be used to get posterior probabilities on the states for those uncertain or unknown terminal taxa.

Well, the same general tactic can be used to get marginal ancestral state reconstructions using likelihood. (I.e., the empirical Bayes marginal reconstructions of Yang 2006. Yang calls these "empirical Bayes" reconstructions, because they are Bayesian posterior probabilities - but they treat the empirical tree & MLE model of evolution as if they are known without error.)

Marginal ancestral state reconstruction using the re-rooting method of Yang is implemented in the phytools function rerootingMethod. This method works by taking advantage of the fact that the normalized likelihoods at the root node from the pruning algorithm are the same as the (posterior) probabilities of each state at that node (Yang 2006) - that is, assuming our model of evolution is symmetric. That means that we should be able to re-root the tree at each internal node and compute the marginal ancestral states (posterior probabilities) for that node via one post-order tree traversal.

When tip nodes are unknown or uncertain this is treated as a prior probability distribution on the state for the tip. For example, we might specify a totally unknown tip as having an equal prior probability of being in any of the states; or we might know that a tip is in state a or b, but not c, in which case we might specify a flat prior probability distribution but only on a & b, with state c getting a prior probability of 0. We can then use these prior probabilities just as we would the conditional likelihoods of internal nodes during the pruning algorithm.

To get the posterior probability of tip nodes is simple - we just re-root the tree at the tip of interest and then compute the normalized conditional likelihoods for that node. These are are empirical Bayes posterior probabilities for the tip states, given our data & fitted model of evolution.

The code for this new function is here; however if you want to try it out, you might want to get the new non-CRAN build of phytools (phytools 0.2-41) because this function calls some internal functions of the phytools package (which it won't be able to do if you just load the function from source).

Here's a quick demo of how it works:

> require(phytools)
> # simulate a tree & data
> tree<-pbtree(n=20,scale=1)
> Q<-matrix(c(-1,1,1,-1),2,2)
> rownames(Q)<-colnames(Q)<-letters[1:2]
> x<-sim.history(tree,Q)\$states
> plot(tree,no.margin=T,edge.width=2,label.offset=0.02)
> tiplabels(pie=to.matrix(x,letters[1:2])[tree\$tip.label,], piecol=c("blue","red"),cex=0.6)
> # ok, get the marginal ASRs without uncertainty
> PP<-rerootingMethod(tree,x)
> nodelabels(pie=PP\$marginal.anc,piecol=c("blue","red"), cex=0.6)

OK, now let's pretend that some of our tip states are uncertain/unknown:

> # now let's pretend we have some uncertainty in
> # our tip states
> Pr<-to.matrix(x,letters[1:2])
> Pr["t3",]<-c(0.5,0.5)
> Pr["t12",]<-c(0.5,0.5)
> Pr["t19",]<-c(0.5,0.5)
> QQ<-rerootingMethod(tree,Pr)
> tiplabels(pie=QQ\$marginal.anc[tree\$tip.label,], piecol=c("blue","red"),cex=0.6)
> nodelabels(pie=QQ\$marginal[as.character(1:tree\$Nnode +length(tree\$tip)),],piecol=c("blue","red"),cex=0.6)

The posterior probabilities for the tip nodes here also seem totally sensible. For example, tip t3 is on the end of a very long branch - so the posterior probability is dominated by the prior. By contrast, tip t19 is on a short branch & nested within a clade in state b (i.e., red); and thus the empirical Bayes posterior probability that t19 is also in state b is very high - as shown in the figure.

Pretty cool!