Wednesday, August 24, 2016

Optimizing node rotations with cophylo under different conditions

The function cophylo creates an object of class "cophylo" and by default will perform node rotations in which the “matching” of tip nodes (either exactly, by tip label, or through the association matrix, assoc) is optimized. Matching is defined as the sum of squared differences between the height of the corresponding tip labels in each of the plotted trees. (It can also be defined differently - such as by the sum of the absolute values of the differences.)

The algorithm I employed is to first traverse tree 1 from root-to-tips (i.e., pre-order) and rotate each node, accepting node rotations that improve matching to tree 2. Next, the same procedure is performed on tree 2. If desired, both pre- and post-order traversals can be performed on each tree.

I do not have a proof that this algorithm is guaranteed to optimize the matching by my criterion (or any) between trees; however it is nonetheless my empirical observation that if a precise match exists, the algorithm will find it.

Here is a demonstration using a simulated 80 taxon tree in which I will first randomly rotate nodes 200 times using rotateNodes, next show how tangled this makes an unrotated "cophylo" plot look, then, finally, optimize the node rotations of the two trees and show that the algorithm results in perfect matching:

library(phytools)
## simulate trees
t1<-t2<-rtree(n=80)
## rotate nodes of t2 randomly
for(i in 1:200) 
    t2<-rotateNodes(t2,sample(1:t2$Nnode+Ntip(t2),1))
## tangled "cophylo" plot
tangled<-cophylo(t1,t2,rotate=FALSE)
plot(tangled,fsize=0.5)

plot of chunk unnamed-chunk-1

## untangled "cophylo" plot
untangled<-cophylo(t1,t2,rotate=TRUE,print=TRUE)
## Rotating nodes to optimize matching...
## objective: 13202
## objective: 12012
## objective: 4362
...
...
## objective: 18
## objective: 2
## objective: 0
## Done.
plot(untangled,fsize=0.5)

plot of chunk unnamed-chunk-1

Both input trees in the previous example are perfectly bifurcating. The method also has an option that works for multifurcating trees, rotate.multi, in which essentially we try all possible ordering of the daughter edges of each multifurcation. Note that this is will result in a prohibitively large number of trees to evaluate for the case of high order multifurcations. For instance, a multifurcation with 7 descendant daughter edges will have 7! = 5040 possible orderings.

In addition, we may need to perform both pre- and post-order tree traversal to optimize the matching. Here I will set a bunch of edges to zero in two identical trees - then perform random node rotations on one of them - then collapse all the zero-length edges to multifurcations in each tree. Finally, I will try to match both trees with rotate.multi=TRUE and methods=c("pre","post"):

ee<-which(t1$edge[,2]>Ntip(t1))
t3<-t1
t3$edge.length[sample(ee,20)]<-0
t4<-t3
for(i in 1:200) 
    t4<-rotateNodes(t4,sample(1:t4$Nnode+Ntip(t4),1))
t3<-di2multi(t3)
t4<-di2multi(t4)
## with rotate.multi=FALSE
wo.multi<-cophylo(t3,t4,print=TRUE)
## Rotating nodes to optimize matching...
## objective: 20854
...
...
## objective: 3958
## Done.
plot(wo.multi,fsize=0.5)

plot of chunk unnamed-chunk-2

## with rotate.multi=TRUE
w.multi<-cophylo(t3,t4,rotate.multi=TRUE,
    methods=c("pre","post"),print=TRUE)
## Rotating nodes to optimize matching...
## objective: 89494
## objective: 82494
## objective: 82494
## objective: 82482
...
...
## objective: 1232
## objective: 882
## objective: 0
## objective: 0
## Done.
plot(w.multi,fsize=0.5)

plot of chunk unnamed-chunk-2

In addition, I thought it might be worthwhile to demonstrate that the methods work even for quite large trees. For instance:

t1<-t2<-rtree(n=1000)
for(i in 1:1000) t2<-rotateNodes(t2,sample(1:t2$Nnode+Ntip(t2),1))
tangled<-cophylo(t1,t2,rotate=FALSE)
plot(tangled,pts=FALSE,ftype="off")

plot of chunk unnamed-chunk-3

## untangled "cophylo" plot
untangled<-cophylo(t1,t2,rotate=TRUE,print=TRUE)
## Rotating nodes to optimize matching...
## objective: 10141238
## objective: 10141238
## objective: 10141238
## objective: 10141238
## objective: 7131734
## objective: 7131734
## objective: 7131734
...
...
...
## objective: 0
## objective: 0
## Done.
plot(untangled,pts=FALSE,ftype="off")

plot of chunk unnamed-chunk-3

(It may be hard to tell, but the value of our objective function, 0, tells us they match exactly!)

Finally, there is another case in which tips can be matched precisely - when they have the same “cladewise” order, but differ completely in topology. Interestingly, this is actually a problem that cophylo cannot solve:

t1<-rtree(n=80)
t2<-rtree(n=80)
t1$tip.label<-t2$tip.label<-paste("t",1:80,sep="")
obj<-cophylo(t1,t2,rotate=FALSE)
for(i in 1:200) t2<-rotateNodes(t2,sample(1:t2$Nnode+Ntip(t2),1))
plot(obj,fsize=0.4)

plot of chunk unnamed-chunk-4

obj<-cophylo(t1,t2,print=TRUE,methods=c("pre","post"))
## Rotating nodes to optimize matching...
## objective: 41422
## objective: 41422
...
...
...
## objective: 12256
## Done.
plot(obj,fsize=0.4)

plot of chunk unnamed-chunk-4

That's all.

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.