Today I received an inquiry about the simple phytools utility function matchNodes
.
matchNodes
takes two trees as input and matches the node indices of the two trees using
one of two criteria: the labels of the descendant tips of each node (method="descendants"
);
or the distances between matching tips (method="distances"
).
The inquiry was sent via a series of questions, but to paraphrase it here – the user essentially
wanted to know if matchNodes
could be used to identify corresponding nodes in a pair of trees
in which the taxa present in each tree might be different – for instance, a pair of gene trees
with slightly different taxa represented in each tree.
The answer is “yes & no.”
Effectively, matchNodes
is not designed for this, but it's possible to think of an algorithm
by which we could use matchNodes
to accomplish this task.
First, let's make some data that have this property:
library(phytools)
## two identical trees
t1<-t2<-pbtree(n=26,tip.label=LETTERS)
## drop random tips from each tree
t1<-drop.tip(t1,sample(LETTERS,8))
t2<-drop.tip(t2,sample(LETTERS,8))
## give them random edge lengths
t1$edge.length<-runif(nrow(t1$edge))
t2$edge.length<-runif(nrow(t2$edge))
## compare them
plot(cophylo(t1,t2,rotate=FALSE))
nodelabels.cophylo()
nodelabels.cophylo(which="right")
From this, we can see right away which nodes match. For instance, node 31
in tree 1 matches
up with node 32
in tree 2; and node 23
in tree 1 matches with node 25
in tree 2. Neither
of these pairing would be identified with matchNodes
on either criterion.
matchNodes(t1,t2)
## tr1 tr2
## [1,] 19 NA
## [2,] 20 NA
## [3,] 21 NA
## [4,] 22 22
## [5,] 23 NA
## [6,] 24 NA
## [7,] 25 NA
## [8,] 26 NA
## [9,] 27 NA
## [10,] 28 NA
## [11,] 29 NA
## [12,] 30 30
## [13,] 31 NA
## [14,] 32 NA
## [15,] 33 NA
## [16,] 34 NA
## [17,] 35 NA
So, what's our solution. Well, what I propose is the following. First, we create two trees (let's say, tree 1' and tree 2') containing only taxa present in both of our original phylogenies.
Second, we use matchNodes
method="descendants"
to compare these two trees, one to the other.
Third, we compare these two trees (tree 1' and tree 2') back to the original trees using
method="distances"
.
Finally, fourth we reconcile the node comparisons of steps two & three.
Let's try:
## step one drop tips
t1p<-drop.tip(t1,setdiff(t1$tip.label,t2$tip.label))
t2p<-drop.tip(t2,setdiff(t2$tip.label,t1$tip.label))
## step two match nodes "descendants"
M<-matchNodes(t1p,t2p)
## step two match nodes "distances"
M1<-matchNodes(t1,t1p,"distances")
M2<-matchNodes(t2,t2p,"distances")
## final step, reconcile
MM<-matrix(NA,t1$Nnode,2,
dimnames=list(NULL,c("left","right")))
for(i in 1:nrow(MM)){
MM[i,1]<-M1[i,1]
nn<-M[which(M[,1]==M1[i,2]),2]
if(length(nn)>0) MM[i,2]<-M2[which(M2[,2]==nn),1]
}
MM
## left right
## [1,] 19 19
## [2,] 20 20
## [3,] 21 21
## [4,] 22 22
## [5,] 23 25
## [6,] 24 26
## [7,] 25 NA
## [8,] 26 27
## [9,] 27 28
## [10,] 28 NA
## [11,] 29 NA
## [12,] 30 30
## [13,] 31 32
## [14,] 32 34
## [15,] 33 NA
## [16,] 34 NA
## [17,] 35 NA
Did we get it right?
plot(cophylo(t1,t2,rotate=FALSE))
nodelabels.cophylo()
nodelabels.cophylo(which="right")
Yes! This just gives us the matches for every node in tree 1 to tree 2, if one exists. To get the inverse we just need to flip our last step:
## final step, reconcile
MM<-matrix(NA,t2$Nnode,2,
dimnames=list(NULL,c("left","right")))
for(i in 1:nrow(MM)){
MM[i,2]<-M2[i,1]
nn<-M[which(M[,1]==M2[i,2]),1]
if(length(nn)>0) MM[i,1]<-M1[which(M1[,2]==nn),1]
}
MM
## left right
## [1,] 19 19
## [2,] 20 20
## [3,] 21 21
## [4,] 22 22
## [5,] NA 23
## [6,] NA 24
## [7,] 23 25
## [8,] 24 26
## [9,] 26 27
## [10,] 27 28
## [11,] NA 29
## [12,] 30 30
## [13,] NA 31
## [14,] 31 32
## [15,] NA 33
## [16,] 32 34
## [17,] NA 35
Lastly, there is surely a more elegant solution to this problem available using ape::prop.part
, or some
other ape or phangorn functions – but the question was about matchNodes
, so here's my answer!!
That's it.
No comments:
Post a Comment
Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.