Saturday, November 24, 2012

Update to function matchNodes

I'm working on a manuscript review this Saturday and (I don't think I'm giving away too much about the article to say that) I needed a function to match the node numbers of a tree with the same nodes in a re-rooted version of the same phylogeny. This differs from the purpose of my previous phytools utility function, matchNodes, which matches nodes based on the commonality of their descendant tips. This method will not, in general, produce the correct set of matchings for a tree that has merely been rerooted at an internal node.

My idea for obtaining the correct matchings was to use the set of distances to the tip species in the tree. Matching nodes with the same set of patristic distances (or, for trees with arbitrary scale, proportional distances) must be equivalent. Here is the core code of the function:
D1<-dist.nodes(tr1)[1:length(tr1$tip), 1:tr1$Nnode+length(tr1$tip)]
D2<-dist.nodes(tr2)[1:length(tr2$tip), 1:tr2$Nnode+length(tr2$tip)]
rownames(D1)<-tr1$tip.label
rownames(D2)<-tr2$tip.label; D2<-D2[rownames(D1),]
for(i in 1:tr1$Nnode){
  z<-apply(D2,2,function(X,y) cor(X,y),y=D1[,i])
  Nodes[i,1]<-as.numeric(colnames(D1)[i])
  Nodes[i,2]<-as.numeric(names(which(z>=(1-tol))))
}
The first part of the code uses the 'ape' function dist.nodes to compute the set of distances from each node to all tips; and then sorts them to have the same row-wise ordering in the two matrices. Next, we go through the columns of D1 and identify the column of D2 with proportional or equal distances.

Of course - some important bookkeeping has been omitted here. For the full code, see utilities.R on the phytools page.

Let's try it out:
> tr1<-pbtree(n=20)
> par(mfcol=c(1,2))
> plotTree(tr1,node.numbers=T)
> # re-root tr1 at node=33
> tr2<-root(tr1,node=33)
> plotTree(tr2,node.numbers=T)
> matchNodes(tr1,tr2,method="distances")
      tr1 tr2
[1,]  21  NA
[2,]  22  30
[3,]  23  31
[4,]  24  32
[5,]  25  29
[6,]  26  25
[7,]  27  26
[8,]  28  27
[9,]  29  28
[10,]  30  24
[11,]  31  23
[12,]  32  22
[13,]  33  21
[14,]  34  38
[15,]  35  33
[16,]  36  34
[17,]  37  35
[18,]  38  36
[19,]  39  37

> # we can also reverse the order of the trees
> matchNodes(tr2,tr1,method="distances")
      tr1 tr2
[1,]  21  33
[2,]  22  32
[3,]  23  31
[4,]  24  30
[5,]  25  26
[6,]  26  27
[7,]  27  28
[8,]  28  29
[9,]  29  25
[10,]  30  22
[11,]  31  23
[12,]  32  24
[13,]  33  35
[14,]  34  36
[15,]  35  37
[16,]  36  38
[17,]  37  39
[18,]  38  34


Careful inspection of the matchings above should (I hope) reveal that the nodes from tr1 have been correctly matched to the corresponding nodes of tr2 (in the first example - and the converse in the second case). The column headings in the second example reflect the input order of the trees - not their names in memory.

Note that case 1 leaves one unmatched node - the root. In case two there are no unmatched nodes because all of the n - 2 nodes in tr2 have a compatriot in tr1.

Assuming no major errors are identified, this function will be in the next version of phytools.

That's it.

1 comment:

  1. I built a new version of phytools with this function added - download the source here.

    ReplyDelete