My idea for obtaining the correct matchings was to use the set of distances to the tip species in the tree. Matching nodes with the same set of patristic distances (or, for trees with arbitrary scale, proportional distances) must be equivalent. Here is the core code of the function:
D1<-dist.nodes(tr1)[1:length(tr1$tip), 1:tr1$Nnode+length(tr1$tip)]
D2<-dist.nodes(tr2)[1:length(tr2$tip), 1:tr2$Nnode+length(tr2$tip)]
rownames(D1)<-tr1$tip.label
rownames(D2)<-tr2$tip.label; D2<-D2[rownames(D1),]
for(i in 1:tr1$Nnode){
z<-apply(D2,2,function(X,y) cor(X,y),y=D1[,i])
Nodes[i,1]<-as.numeric(colnames(D1)[i])
Nodes[i,2]<-as.numeric(names(which(z>=(1-tol))))
}
The first part of the code uses the 'ape' function dist.nodes to compute the set of distances from each node to all tips; and then sorts them to have the same row-wise ordering in the two matrices. Next, we go through the columns of D1 and identify the column of D2 with proportional or equal distances.D2<-dist.nodes(tr2)[1:length(tr2$tip), 1:tr2$Nnode+length(tr2$tip)]
rownames(D1)<-tr1$tip.label
rownames(D2)<-tr2$tip.label; D2<-D2[rownames(D1),]
for(i in 1:tr1$Nnode){
z<-apply(D2,2,function(X,y) cor(X,y),y=D1[,i])
Nodes[i,1]<-as.numeric(colnames(D1)[i])
Nodes[i,2]<-as.numeric(names(which(z>=(1-tol))))
}
Of course - some important bookkeeping has been omitted here. For the full code, see utilities.R on the phytools page.
Let's try it out:
> tr1<-pbtree(n=20)
> par(mfcol=c(1,2))
> plotTree(tr1,node.numbers=T)
> # re-root tr1 at node=33
> tr2<-root(tr1,node=33)
> plotTree(tr2,node.numbers=T)
> par(mfcol=c(1,2))
> plotTree(tr1,node.numbers=T)
> # re-root tr1 at node=33
> tr2<-root(tr1,node=33)
> plotTree(tr2,node.numbers=T)
> matchNodes(tr1,tr2,method="distances")
tr1 tr2
[1,] 21 NA
[2,] 22 30
[3,] 23 31
[4,] 24 32
[5,] 25 29
[6,] 26 25
[7,] 27 26
[8,] 28 27
[9,] 29 28
[10,] 30 24
[11,] 31 23
[12,] 32 22
[13,] 33 21
[14,] 34 38
[15,] 35 33
[16,] 36 34
[17,] 37 35
[18,] 38 36
[19,] 39 37
> # we can also reverse the order of the trees
> matchNodes(tr2,tr1,method="distances")
tr1 tr2
[1,] 21 33
[2,] 22 32
[3,] 23 31
[4,] 24 30
[5,] 25 26
[6,] 26 27
[7,] 27 28
[8,] 28 29
[9,] 29 25
[10,] 30 22
[11,] 31 23
[12,] 32 24
[13,] 33 35
[14,] 34 36
[15,] 35 37
[16,] 36 38
[17,] 37 39
[18,] 38 34
tr1 tr2
[1,] 21 NA
[2,] 22 30
[3,] 23 31
[4,] 24 32
[5,] 25 29
[6,] 26 25
[7,] 27 26
[8,] 28 27
[9,] 29 28
[10,] 30 24
[11,] 31 23
[12,] 32 22
[13,] 33 21
[14,] 34 38
[15,] 35 33
[16,] 36 34
[17,] 37 35
[18,] 38 36
[19,] 39 37
> # we can also reverse the order of the trees
> matchNodes(tr2,tr1,method="distances")
tr1 tr2
[1,] 21 33
[2,] 22 32
[3,] 23 31
[4,] 24 30
[5,] 25 26
[6,] 26 27
[7,] 27 28
[8,] 28 29
[9,] 29 25
[10,] 30 22
[11,] 31 23
[12,] 32 24
[13,] 33 35
[14,] 34 36
[15,] 35 37
[16,] 36 38
[17,] 37 39
[18,] 38 34
Careful inspection of the matchings above should (I hope) reveal that the nodes from tr1 have been correctly matched to the corresponding nodes of tr2 (in the first example - and the converse in the second case). The column headings in the second example reflect the input order of the trees - not their names in memory.
Note that case 1 leaves one unmatched node - the root. In case two there are no unmatched nodes because all of the n - 2 nodes in tr2 have a compatriot in tr1.
Assuming no major errors are identified, this function will be in the next version of phytools.
That's it.
I built a new version of phytools with this function added - download the source here.
ReplyDelete