Sunday, March 6, 2011

Counting descendant leaves

David Bapst was today interested in computing the number of tips (leaves) descended from each internal node in the tree. These can be obtained (for each node above the root) for the {ape} "phylo" object phy using the following command:

> require(geiger) # load the geiger package
> desc<-sapply(tree$edge[,2],function(x) node.leaves(tree,x))
> Ndesc<-sapply(desc,function(x) length(x))

I thought it might be faster to count the number of descendants as we read our tree into memory. This is possible because by the time our Newick tree parser reaches a ")" in the string, all of the descendant edges and tips already exist in memory. If we are diligent about assigning counts to all internal nodes, to get the count for the current node all we have to do is add the counts for its two (or more) daughters.

To allow David to see if this is indeed faster, I have added this to my existing (and otherwise not particularly useful) function read.newick(). Now this function creates a "phylo" object with the additional vector $Ndesc containing the number of tips descended from each node above the root. The order of $Ndesc is the row order of $edge[,2]. I have posted this to my R phylogenetics page (direct link to code here).

1 comment:

  1. Turns out Dave actually wanted a fast method to get the tip numbers for all the tips descended from each node (i.e., their node numbers), not the number of tips descended from each node as I'd mistakenly believed. Oops. Well, you can do this according to the same principle. Maybe I will address this in a future post.