Saturday, February 13, 2021

On matching nodes between trees using matchNodes

Today I received an inquiry about the simple phytools utility function matchNodes.

matchNodes takes two trees as input and matches the node indices of the two trees using one of two criteria: the labels of the descendant tips of each node (method="descendants"); or the distances between matching tips (method="distances").

The inquiry was sent via a series of questions, but to paraphrase it here – the user essentially wanted to know if matchNodes could be used to identify corresponding nodes in a pair of trees in which the taxa present in each tree might be different – for instance, a pair of gene trees with slightly different taxa represented in each tree.

The answer is “yes & no.”

Effectively, matchNodes is not designed for this, but it's possible to think of an algorithm by which we could use matchNodes to accomplish this task.

First, let's make some data that have this property:

library(phytools)
## two identical trees
t1<-t2<-pbtree(n=26,tip.label=LETTERS)
## drop random tips from each tree
t1<-drop.tip(t1,sample(LETTERS,8))
t2<-drop.tip(t2,sample(LETTERS,8))
## give them random edge lengths
t1$edge.length<-runif(nrow(t1$edge))
t2$edge.length<-runif(nrow(t2$edge))
## compare them
plot(cophylo(t1,t2,rotate=FALSE))
nodelabels.cophylo()
nodelabels.cophylo(which="right")

plot of chunk unnamed-chunk-2

From this, we can see right away which nodes match. For instance, node 31 in tree 1 matches up with node 32 in tree 2; and node 23 in tree 1 matches with node 25 in tree 2. Neither of these pairing would be identified with matchNodes on either criterion.

matchNodes(t1,t2)
##       tr1 tr2
##  [1,]  19  NA
##  [2,]  20  NA
##  [3,]  21  NA
##  [4,]  22  22
##  [5,]  23  NA
##  [6,]  24  NA
##  [7,]  25  NA
##  [8,]  26  NA
##  [9,]  27  NA
## [10,]  28  NA
## [11,]  29  NA
## [12,]  30  30
## [13,]  31  NA
## [14,]  32  NA
## [15,]  33  NA
## [16,]  34  NA
## [17,]  35  NA

So, what's our solution. Well, what I propose is the following. First, we create two trees (let's say, tree 1' and tree 2') containing only taxa present in both of our original phylogenies.

Second, we use matchNodes method="descendants" to compare these two trees, one to the other.

Third, we compare these two trees (tree 1' and tree 2') back to the original trees using method="distances".

Finally, fourth we reconcile the node comparisons of steps two & three.

Let's try:

## step one drop tips
t1p<-drop.tip(t1,setdiff(t1$tip.label,t2$tip.label))
t2p<-drop.tip(t2,setdiff(t2$tip.label,t1$tip.label))

## step two match nodes "descendants"
M<-matchNodes(t1p,t2p)

## step two match nodes "distances"
M1<-matchNodes(t1,t1p,"distances")
M2<-matchNodes(t2,t2p,"distances")

## final step, reconcile
MM<-matrix(NA,t1$Nnode,2,
    dimnames=list(NULL,c("left","right")))
for(i in 1:nrow(MM)){
    MM[i,1]<-M1[i,1]
    nn<-M[which(M[,1]==M1[i,2]),2]
    if(length(nn)>0) MM[i,2]<-M2[which(M2[,2]==nn),1]
}
MM
##       left right
##  [1,]   19    19
##  [2,]   20    20
##  [3,]   21    21
##  [4,]   22    22
##  [5,]   23    25
##  [6,]   24    26
##  [7,]   25    NA
##  [8,]   26    27
##  [9,]   27    28
## [10,]   28    NA
## [11,]   29    NA
## [12,]   30    30
## [13,]   31    32
## [14,]   32    34
## [15,]   33    NA
## [16,]   34    NA
## [17,]   35    NA

Did we get it right?

plot(cophylo(t1,t2,rotate=FALSE))
nodelabels.cophylo()
nodelabels.cophylo(which="right")

plot of chunk unnamed-chunk-5

Yes! This just gives us the matches for every node in tree 1 to tree 2, if one exists. To get the inverse we just need to flip our last step:

## final step, reconcile
MM<-matrix(NA,t2$Nnode,2,
    dimnames=list(NULL,c("left","right")))
for(i in 1:nrow(MM)){
    MM[i,2]<-M2[i,1]
    nn<-M[which(M[,1]==M2[i,2]),1]
    if(length(nn)>0) MM[i,1]<-M1[which(M1[,2]==nn),1]
}
MM
##       left right
##  [1,]   19    19
##  [2,]   20    20
##  [3,]   21    21
##  [4,]   22    22
##  [5,]   NA    23
##  [6,]   NA    24
##  [7,]   23    25
##  [8,]   24    26
##  [9,]   26    27
## [10,]   27    28
## [11,]   NA    29
## [12,]   30    30
## [13,]   NA    31
## [14,]   31    32
## [15,]   NA    33
## [16,]   32    34
## [17,]   NA    35

Lastly, there is surely a more elegant solution to this problem available using ape::prop.part, or some other ape or phangorn functions – but the question was about matchNodes, so here's my answer!!

That's it.

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.