A phytools user recently reported that the ancestral character estimation
method for discrete traits in phytools, `rerootingMethod`

, does
not work for trees with polytomies. This is in fact correct - although I
had long forgotten that this was the case.

The function `rerootingMethod`

uses the approach of Yang et al.
(1995) to compute the marginal ancestral states by re-rooting the tree
at all internal nodes. For instances in which some states at the tips are
uncertain, it can also re-root at the tip to compute empirical Bayesian
posterior probabilities for the leaves as well. The function is less
relevant now that `ace`

in the ape package also can compute the
marginal
ancestral states. (Although, as far as I know, `ace`

can
also not handle polytomies.) `rerootingMethod`

stays somewhat
relevant only by virtue of being able to compute posterior probabilities
at tips, and for handling uncertainty at tip states (two sides of the
same coin, I guess).

It was fairly straightforward to update `rerootingMethod`

to
handle polytomies, and the code is posted
here.
The way that this
current
version works is pretty simple. It just first takes the input tree,
checks for polytomies using `is.binary.tree`

in the ape package,
resolves polytomies randomly with branches of zero length using
`multi2di`

, estimates ancestral states for all internal nodes (&
tips, if `tips=TRUE`

), and the uses the phytools function
`matchNodes`

to back-translate the reconstructed nodes in the resolved tree to the
original, input tree. Note that although it is doing something here that
`ace`

does not do - the code internally uses (modified code from)
`ace`

.

Here - we can check it out by simulating a tree with some polytomies, generating trait data for tips & nodes, and then reconstructing ancestral states on the tree:

```
library(phytools)
dt<-rtree(n=26)
dt$tip.label<-LETTERS
## set some of the internal branches
ii<-which(dt$edge[,2]>Ntip(dt))
dt$edge.length[sample(ii,4)]<-0
mt<-di2multi(dt)
plotTree(mt)
```

Next let's simulate a character up the tree:

```
Q<-matrix(c(-1,0.5,0.5,
0.5,-1,0.5,0.5,0.5,-1),3,3,
dimnames=list(letters[1:3],letters[1:3]))
st<-sim.history(mt,Q)
```

```
## Done simulation(s).
```

```
x<-st$states
y<-getStates(st,"nodes")
```

Now, we can reconstruct & plot ancestral states using `ace`

on
the bifurcating tree; and `rerootingMethod`

on the bi- and
on the multifurcating trees. If the methods are working properly then the
reconstructions will be the same - although in the bifurcating node there
will be additional nodes with states identical to other nodes from which
they are separated by branches of zero length.

```
## first ace
fit1<-ace(x,mt,type="discrete",model="ER") ## doesn't work
```

```
## Error in ace(x, mt, type = "discrete", model = "ER"): "phy" is not rooted AND fully dichotomous.
```

```
fit2<-ace(x,dt,type="discrete",model="ER") ## doesn't work
```

```
## Error in ace(x, dt, type = "discrete", model = "ER"): some branches have length zero or negative
```

```
## set zero-length branches to be 1/1000000 total tree length
dst<-dt
dst$edge.length[dst$edge.length==0]<-max(nodeHeights(dt))*1e-6
fit3<-ace(x,dst,type="discrete",model="ER")
fit3
```

```
##
## Ancestral Character Estimation
##
## Call: ace(x = x, phy = dst, type = "discrete", model = "ER")
##
## Log-likelihood: -25.3342
##
## Rate index matrix:
## a b c
## a . 1 1
## b 1 . 1
## c 1 1 .
##
## Parameter estimates:
## rate index estimate std-err
## 1 0.3278 0.1042
##
## Scaled likelihoods at the root (type '...$lik.anc' to get them for all nodes):
## a b c
## 0.82837257 0.09781827 0.07380916
```

Now with the *new* version of `rerootingMethod`

:

```
packageVersion("phytools")
```

```
## [1] '0.4.57'
```

```
fit4<-rerootingMethod(mt,x,model="ER")
fit4
```

```
## $loglik
## [1] -25.33421
##
## $Q
## a b c
## a -0.6556595 0.3278298 0.3278298
## b 0.3278298 -0.6556595 0.3278298
## c 0.3278298 0.3278298 -0.6556595
##
## $marginal.anc
## a b c
## 27 0.8283833959 0.09781145 0.0738051507
## 28 0.2121485528 0.69489845 0.0929529991
## 29 0.0532939602 0.91028552 0.0364205221
## 30 0.0627113681 0.88952034 0.0477682947
## 31 0.0132238730 0.97569370 0.0110824277
## 32 0.3870898035 0.30541903 0.3074911711
## 33 0.4035347684 0.15548939 0.4409758369
## 34 0.5865138808 0.07266794 0.3408181834
## 35 0.3530969369 0.06523786 0.5816652004
## 36 0.7045263050 0.13922658 0.1562471186
## 37 0.1288634178 0.06087131 0.8102652730
## 38 0.3401006710 0.49952777 0.1603715570
## 39 0.4129576403 0.32707964 0.2599627226
## 40 0.1266883464 0.35550347 0.5178081798
## 41 0.0553700550 0.21371716 0.7309127832
## 42 0.8088376946 0.10680265 0.0843596595
## 43 0.8570510600 0.08945954 0.0534894017
## 44 0.6274100032 0.30199393 0.0705960682
## 45 0.0836944308 0.86315981 0.0531457619
## 46 0.0119373448 0.97893201 0.0091306462
## 47 0.0006664523 0.99874487 0.0005886784
```

If we plot both of these on our trees in turn, we should see that the marginal reconstructions (excepting the small deviation that was required to give our tree non-zero branch lengths throughout) should be equal:

```
## first ace
plotTree(dt,offset=0.5)
nodelabels(pie=fit3$lik.anc)
tiplabels(pie=to.matrix(x,seq=letters[1:3]),cex=0.5)
```

```
## now phytools
plotTree(mt,offset=0.5)
nodelabels(pie=fit4$marginal.anc)
tiplabels(pie=to.matrix(x,seq=letters[1:3]),cex=0.5)
```

Finally, for fun let's overlay the *true*, known states to see how
close (or off) we were:

```
plotSimmap(st,colors=setNames(c("red","green","blue"),letters[1:3]),
offset=0.5)
nodelabels(pie=fit4$marginal.anc)
tiplabels(pie=to.matrix(x,seq=letters[1:3]),cex=0.5)
```

That's it.

I am unable to run the rerootingMethod, despite it having the same format input of 'x':

ReplyDeleteError in `colnames<-`(`*tmp*`, value = c("a", "c")) :

attempt to set 'colnames' on an object with less than two dimensions

Any suggestions? Thanks

Hi Alex. What is your object x? It should be a vector of type character (or a factor) with names corresponding to the species names in the tree.

DeleteHi Liam,

ReplyDeleteI’m working on adding evolution of a discrete character to a phylogeny. First I am attempting to model ER, SYM, and ARD using rerootingMethod. ER and ARD seem to work properly, with ARD being a better fit. SYM, however, produces the error “In log(comp[1:M + N]) : NaNs produced”. I suspect from your post here (http://blog.phytools.org/2016/05/more-on-plot-method-for-object-class.html) it has to do with parameters versus tips, but in that case, why does ARD seem to work? Furthermore, when I move onto using make.simmap to see the (code: SYMtrees <- make.simmap(tree,character,model="SYM",nsim=1000,q="mcmc"), the resulting sectors in the node piecharts are all equal, and the same process for ARD produces NaNs but with a sensible output. Does this suggest that the character and phylogeny will only support the simpler ER model? Is there something else I am missing? Thank you!

-Bob

dst<-tree

ReplyDelete> dst$edge.length[dst$edge.length==0]<-max(nodeHeights(tree))

How can this be used on a class of multi.phylo (I have over 3000 posterior trees from a Bayes anaylsis).

I am trying to run SIMMAP but it keeps giving me this error " Error in xx[tree$tip.label, ] : subscript out of bounds"

I assume that means I need to use the code you used above to set non zero lengths to 1/1000 however when I try to use the code above I get this error " Error in nodeHeights(ntrees) : tree should be an object of class "phylo"."

However, When I use 'is.binary.multiphylo' I get "true" for all 3000 trees. Any help with this would be greatly appreciated.

This comment has been removed by the author.

ReplyDeleteHi, just wondering how nodes are named. I've used this method for >100 discrete characters and now want to investigate each node, for each character. How do I know which node "155" refers to?

ReplyDelete