Tuesday, June 5, 2018

Preserving node & edge labels after optimizing node rotation in cophylo

A phytools user contacted me the other day about an issue in which her bootstrap values, stored as node labels in a Newick string, did not align with the correct edges when plotted using edgelabels.cophylo on a visualized "cophylo" object.

As it turns out, this had nothing to do with cophylo & everything to do with the fact that she wanted to plot node labels using edgelabels.

First, here is a demo showing that cophylo preserves the correct order of node labels.

A simulated tree with node labels:

library(phytools)
tree<-pbtree(n=26,tip.label=LETTERS)
tree$node.label<-letters[1:25]

Now I'll plot that tree labeling the nodes with their labels & node indices:

plotTree(tree)
nodelabels(tree$node.label,1:tree$Nnode+Ntip(tree))
nodelabels(adj=c(1.8,-1.8),frame="none",cex=0.5)

plot of chunk unnamed-chunk-2

Next, I'm going to randomly rotate nodes on this tree to produce two new trees as follows:

t1<-tree
for(i in 1:100) t1<-untangle(rotate(t1,sample(1:t1$Nnode+Ntip(t1),1)),
    'read.tree')
t2<-tree
for(i in 1:100) t2<-untangle(rotate(t2,sample(1:t2$Nnode+Ntip(t2),1)),
    'read.tree')

Now I'm ready to make my "cophylo" object. First, without node rotation to maximize matching:

obj<-cophylo(t1,t2,rotate=FALSE)
plot(obj)
nodelabels.cophylo(obj$trees[[1]]$node.label,1:obj$trees[[1]]$Nnode+
    Ntip(obj$trees[[1]]))
nodelabels.cophylo(adj=c(1.8,-1.8),frame="none",cex=0.5)
nodelabels.cophylo(obj$trees[[2]]$node.label,1:obj$trees[[2]]$Nnode+
    Ntip(obj$trees[[2]]),which="right")
nodelabels.cophylo(adj=c(-0.8,-1.8),frame="none",cex=0.5,which="right")

plot of chunk unnamed-chunk-4

We can see that the node indices (the little numbers that come from tree$edge have changed, but the node labels are still all associated with the right clades.

Now we can do the same thing, but in which we permit node rotation to be optimized as follows:

obj<-cophylo(t1,t2,rotate=TRUE)
## Rotating nodes to optimize matching...
## Done.
plot(obj)
nodelabels.cophylo(obj$trees[[1]]$node.label,1:obj$trees[[1]]$Nnode+
    Ntip(obj$trees[[1]]))
nodelabels.cophylo(adj=c(1.8,-1.8),frame="none",cex=0.5)
nodelabels.cophylo(obj$trees[[2]]$node.label,1:obj$trees[[2]]$Nnode+
    Ntip(obj$trees[[2]]),which="right")
nodelabels.cophylo(adj=c(-0.8,-1.8),frame="none",cex=0.5,which="right")

plot of chunk unnamed-chunk-5

So what was the problem with the edge labels in the user's inquiry? Well, basically, edgelabels (and edgelabels.cophylo, which just uses edgelabels internally) takes the labels in the order of the rows of tree$edge in the plotted phylogeny

  • not in the number order of the node indices, as in nodelabels and tree$node.label. Thus, we have to match the two in order to visualize our node labels along each preceding edge. This might be done as follows:
obj<-cophylo(t1,t2,rotate=TRUE)
## Rotating nodes to optimize matching...
## Done.
plot(obj)
edgelabels.cophylo(obj$trees[[1]]$node.label[2:obj$trees[[1]]$Nnode],
    edge=sapply(2:obj$trees[[1]]$Nnode+Ntip(obj$trees[[1]]),
    function(n,e) which(e==n),e=obj$trees[[1]]$edge[,2]),
    frame="none",adj=c(0.5,1))
edgelabels.cophylo(obj$trees[[2]]$node.label[2:obj$trees[[2]]$Nnode],
    edge=sapply(2:obj$trees[[2]]$Nnode+Ntip(obj$trees[[2]]),
    function(n,e) which(e==n),e=obj$trees[[2]]$edge[,2]),
    frame="none",adj=c(0.5,1),which="right")

plot of chunk unnamed-chunk-6

Neat.

Sunday, May 20, 2018

When a phylogeny fails is.binary but is not fixed with multi2di

A phytools user recently posted about an odd case in which a tree read from file failed is.binary, but then after running multi2di still failed. How could this be? Though it seems paradoxical, it isn't. is.binary checks whether all nodes have two & only two descendants. (That is, it checks if all nodes except the root node are of order 3.) multi2di randomly (by default) resolves all multifurcating nodes. If the tree has edge lengths it will do so by adding new internal edges of zero length. What this doesn't consider is the possibility that some nodes (other than the root) may be of order 2: that is, with one & only one descendant. This tree would fail is.binary (because it is not binary - some nodes are unary), but has no polytomies to resolve!

Here is the tree (stripped of its original labels):

library(phytools)
text<-"(((((((((((((((((((((((((((((((((L:24.2285):24.2285,((((((K:5.385058):6.310609):7.033328):1.825603):8.223723):13.292609):6.386152):11.370258,(((((((J:8.16571):5.975676):1.97602):5.139791):7.294121):19.192381):11.351777):0.731864):7.809366):3.304034,(((((I:21.783):21.783):6.434449):3.58192):15.0801):2.27819):2.71654):0.17678):26.679566):0.908691):2.789943):6.809574):1.370987):0.275588):0.58204):3.627011):0.657279,((((((((((((H:49.694):49.694,(G:49.694,F:49.694):49.694):0.597179):0.59605):0.532812):2.194713):0.986243):1.553033):5.53298):0.109862):1.76773,(((((((((((((((((((((((E:1.349215):1.963638):0.691401):1.442252):4.065793):3.775248):1.85111):2.197996):6.083541):0.632039):3.842302):4.132199,(((((((((((((((D:0.059925):3.244179):0.375324):0.720338):0.669455):0.653144):0.66027):1.832826):2.726546):0.565331):2.93859):1.526313):3.117295):0.666207):8.180683):4.090307):8.852789):1.221225):1.351458):9.430401):4.699684,(((((((((((((((C:4.032853):1.231613):0.597802):1.21177):8.62263):3.377616):11.944507):2.720167):1.520349):6.47628):6.460648):1.757607):0.892871):2.226774):0.671859):3.836941):7.044528):4.515714):0.276658):2.378927):0.250678):34.9466):6.26322):0.420986,((((B:24.79937):33.360252):11.892937):13.553241):30.0738):3.85514):0.138911):1.491856,((((((((((((A:18.621535):16.259864):11.22219):37.778767):1.690382):6.87514):13.22793):1.93366):8.908438):0.179371):1.46094):0.986915):0.020375):17.73496):43.727816):7.65496):2.25732):3.495264):23.5968):16.908091):8.728959):108.965):38.467865):10.085114):4.856365):5.208675):11.765386):17.675263);"
tree<-read.tree(text=text)

It would seem to read & plot fine, as follows:

plotTree(tree)

plot of chunk unnamed-chunk-2

But, as promised, the tree fails is.binary & multi2di fails to resolve this:

is.binary(tree)
## [1] FALSE
tree<-multi2di(tree)
is.binary(tree)
## [1] FALSE

As mentioned above, this is due to the presence of singletons: nodes, other than the root, of order 2. We can see the singleton nodes using plotTree.singletons. There are a bunch of them!

plotTree.singletons(tree)

plot of chunk unnamed-chunk-4

Fortunately, this too can be resolved using the handy ape function collapse.singles:

tree<-collapse.singles(tree,root.edge=TRUE)
plot(tree,edge.width=2,root.edge=TRUE,no.margin=TRUE)

plot of chunk unnamed-chunk-5

(There's a good chance we don't really want the root edge, in which case we could just set root.edge=FALSE, the default, in the code above.)

Where do all these singleton nodes come from? Since this has come up before, my suspicion is that it is due to some software external to R prunes edges & clades from a tree but leaves intact the original nodes. In fact, this can be emulated using ape::drop.tip(...,collapse.singles=FALSE), e.g.:

tree<-pbtree(n=100,scale=1)
tree<-drop.tip(tree,sample(tree$tip.label,75),collapse.singles=FALSE)
plotTree.singletons(tree)

plot of chunk unnamed-chunk-6

That's all.