Saturday, February 13, 2021

On matching nodes between trees using `matchNodes`

Today I received an inquiry about the simple phytools utility function `matchNodes`.

`matchNodes` takes two trees as input and matches the node indices of the two trees using one of two criteria: the labels of the descendant tips of each node (`method="descendants"`); or the distances between matching tips (`method="distances"`).

The inquiry was sent via a series of questions, but to paraphrase it here – the user essentially wanted to know if `matchNodes` could be used to identify corresponding nodes in a pair of trees in which the taxa present in each tree might be different – for instance, a pair of gene trees with slightly different taxa represented in each tree.

The answer is “yes & no.”

Effectively, `matchNodes` is not designed for this, but it's possible to think of an algorithm by which we could use `matchNodes` to accomplish this task.

First, let's make some data that have this property:

``````library(phytools)
## two identical trees
t1<-t2<-pbtree(n=26,tip.label=LETTERS)
## drop random tips from each tree
t1<-drop.tip(t1,sample(LETTERS,8))
t2<-drop.tip(t2,sample(LETTERS,8))
## give them random edge lengths
t1\$edge.length<-runif(nrow(t1\$edge))
t2\$edge.length<-runif(nrow(t2\$edge))
## compare them
plot(cophylo(t1,t2,rotate=FALSE))
nodelabels.cophylo()
nodelabels.cophylo(which="right")
``````

From this, we can see right away which nodes match. For instance, node `31` in tree 1 matches up with node `32` in tree 2; and node `23` in tree 1 matches with node `25` in tree 2. Neither of these pairing would be identified with `matchNodes` on either criterion.

``````matchNodes(t1,t2)
``````
``````##       tr1 tr2
##  [1,]  19  NA
##  [2,]  20  NA
##  [3,]  21  NA
##  [4,]  22  22
##  [5,]  23  NA
##  [6,]  24  NA
##  [7,]  25  NA
##  [8,]  26  NA
##  [9,]  27  NA
## [10,]  28  NA
## [11,]  29  NA
## [12,]  30  30
## [13,]  31  NA
## [14,]  32  NA
## [15,]  33  NA
## [16,]  34  NA
## [17,]  35  NA
``````

So, what's our solution. Well, what I propose is the following. First, we create two trees (let's say, tree 1' and tree 2') containing only taxa present in both of our original phylogenies.

Second, we use `matchNodes` `method="descendants"` to compare these two trees, one to the other.

Third, we compare these two trees (tree 1' and tree 2') back to the original trees using `method="distances"`.

Finally, fourth we reconcile the node comparisons of steps two & three.

Let's try:

``````## step one drop tips
t1p<-drop.tip(t1,setdiff(t1\$tip.label,t2\$tip.label))
t2p<-drop.tip(t2,setdiff(t2\$tip.label,t1\$tip.label))

## step two match nodes "descendants"
M<-matchNodes(t1p,t2p)

## step two match nodes "distances"
M1<-matchNodes(t1,t1p,"distances")
M2<-matchNodes(t2,t2p,"distances")

## final step, reconcile
MM<-matrix(NA,t1\$Nnode,2,
dimnames=list(NULL,c("left","right")))
for(i in 1:nrow(MM)){
MM[i,1]<-M1[i,1]
nn<-M[which(M[,1]==M1[i,2]),2]
if(length(nn)>0) MM[i,2]<-M2[which(M2[,2]==nn),1]
}
MM
``````
``````##       left right
##  [1,]   19    19
##  [2,]   20    20
##  [3,]   21    21
##  [4,]   22    22
##  [5,]   23    25
##  [6,]   24    26
##  [7,]   25    NA
##  [8,]   26    27
##  [9,]   27    28
## [10,]   28    NA
## [11,]   29    NA
## [12,]   30    30
## [13,]   31    32
## [14,]   32    34
## [15,]   33    NA
## [16,]   34    NA
## [17,]   35    NA
``````

Did we get it right?

``````plot(cophylo(t1,t2,rotate=FALSE))
nodelabels.cophylo()
nodelabels.cophylo(which="right")
``````

Yes! This just gives us the matches for every node in tree 1 to tree 2, if one exists. To get the inverse we just need to flip our last step:

``````## final step, reconcile
MM<-matrix(NA,t2\$Nnode,2,
dimnames=list(NULL,c("left","right")))
for(i in 1:nrow(MM)){
MM[i,2]<-M2[i,1]
nn<-M[which(M[,1]==M2[i,2]),1]
if(length(nn)>0) MM[i,1]<-M1[which(M1[,2]==nn),1]
}
MM
``````
``````##       left right
##  [1,]   19    19
##  [2,]   20    20
##  [3,]   21    21
##  [4,]   22    22
##  [5,]   NA    23
##  [6,]   NA    24
##  [7,]   23    25
##  [8,]   24    26
##  [9,]   26    27
## [10,]   27    28
## [11,]   NA    29
## [12,]   30    30
## [13,]   NA    31
## [14,]   31    32
## [15,]   NA    33
## [16,]   32    34
## [17,]   NA    35
``````

Lastly, there is surely a more elegant solution to this problem available using `ape::prop.part`, or some other ape or phangorn functions – but the question was about `matchNodes`, so here's my answer!!

That's it.