Tuesday, March 11, 2014

Function to quickly compute the height above the root of a pair of taxa

Today I received the following phytools user request:

I've been using your phytools package (and the findMRCA function with type="height" to find the height of a specific node in a phylogenetic tree. However, the computation time is rather long for my purpose and I was thinking about using fastMRCA. However, this outputs only the node number of the parental node. Is there a chance to get the coalescent time of this node (as with the type argument of findMRCA). Or do you have a general other suggestion to increase speed (I would really only need to calculate the time of coalescence of two tips of the tree).

Unfortunately, it doesn't matter which (findMRCA or fastMRCA) is used in this case because the bottleneck is nodeHeights, which is used to compute the height of the common ancestor above the root. Here's a quick demo to illustrate that:

> ## simulate a large tree
> tree<-pbtree(n=10000)
> ## using findMRCA
> system.time(h1<-findMRCA(tree,c("t1","t100"),
 type="height"))
   user  system elapsed
  27.67    0.00   27.90
> h1
[1] 2.148063
> ## using fastMRCA & nodeHeights
> system.time(h2<-nodeHeights(tree)[which(tree$edge[,1]==
 fastMRCA(tree,"t1","t100"))[1],1])
   user  system elapsed
  27.13    0.00   27.17
> h2
[1] 2.148063
> ## this is almost completely driven by nodeHeights
> system.time(H<-nodeHeights(tree))
   user  system elapsed
  26.74    0.00   26.77

However, we can take a different approach. What if we instead (1) found all the ancestors (back to the root) of each tip, (2) identified the intersection of these two sets, and (3) summed the parent branch length for each ancestral node above the root in the intersection? Here's a function that does this and it runs pretty fast!

fastHeight<-function(tree,sp1,sp2){
  ancs<-phytools:::getAncestors
  a1<-ancs(tree,which(tree$tip.label==sp1))
  a2<-ancs(tree,which(tree$tip.label==sp2))
  a12<-intersect(a1,a2)
  if(length(a12)>1){
    a12<-a12[2:length(a12)-1]
    h<-sapply(a12,function(i,tree)
      tree$edge.length[which(tree$edge[,2]==i)],tree=tree)
    return(sum(h))
  } else return(0)
}

In use with the same example as before:

> system.time(h3<-fastHeight(tree,"t1","t100"))
   user  system elapsed
   0.01    0.00    0.01
> h3
[1] 2.148063

I will probably put this in phytools.

1 comment:

  1. This code doesn't work if sp1==sp2 (in which case it should return the total tree height - but it does not). This is fixed in a new phytools build (phytools 0.3-96). -- Liam

    ReplyDelete