Friday, April 24, 2015

Finding the youngest node(s) with N or more descendant tips

Today I was asked the following question (slightly paraphrased):

“How do I indentify the node defining the youngest subtree with N (for arbitrary N, in his case it was 5) descendant taxa?”

This is pretty easy. For demonstrative purposes, let's first simulate a tree:

library(phytools)
## simulate tree
tree<-pbtree(n=26,tip.label=LETTERS)

Now let's count the number of tips descended from each internal node of the tree:

## number of descendant taxa by node
N<-setNames(sapply(1:tree$Nnode+Ntip(tree),
    function(x,tree) sum(getDescendants(tree,x)<=Ntip(tree)),tree=tree),
    1:tree$Nnode+Ntip(tree))
## we can plot them to make sure we got it right
plotTree(tree,mar=c(2.1,0.1,0.1,0.1))
nodelabels(N)

plot of chunk unnamed-chunk-2

Next, we can compute the depth of each node. We can do this in more than one way. Here, I have an sapply call around nodeheight, but for a larger tree I would use one call to nodeHeights.

## node depths
d<-setNames(max(nodeHeights(tree))-
    sapply(1:tree$Nnode+Ntip(tree),nodeheight,tree=tree),
    1:tree$Nnode+Ntip(tree))
## again, let's check them by plotting
plotTree(tree,mar=c(2.1,0.1,0.1,0.1))
nodelabels(round(d,2))
axis(1,at=round(0:5/5*max(nodeHeights(tree)),2))

plot of chunk unnamed-chunk-3

Finally, let's find the most recent node (or nodes) with n=5 or more descendant tips:

## which node is the youngest node to have n=5 or more 
## descendant taxa
n<-5
node<-as.numeric(names(which(d[N>=n]==min(d[N>=n]))))

Let's check it visually:

plotTree(tree,mar=c(2.1,0.1,0.1,0.1))
axis(1,at=round(0:5/5*max(nodeHeights(tree)),2))
ii<-which(names(d)==node)
lines(max(nodeHeights(tree))-c(d[ii],d[ii]),
    y=par()$usr[3:4],lty="dashed",col="red")
nodelabels(node=node,"X")

plot of chunk unnamed-chunk-5

That's it!

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.