Wednesday, November 23, 2011

Faster vcvPhylo() with ancestral nodes

In my last post I put up a simple version vcv.phylo() (which returns a matrix containing the height above the root of each pair of species in the tree, sometimes called the phylogenetic variance-covariance matrix), but that included ancestral nodes. I pulled this out of an old function of mine, anc.trend(), so I figured I could do better - and after some tinkering I came up with a much improved version, as follows:

vcvPhylo2<-function(tree,anc.nodes=T){
  n<-length(tree$tip.label)
  h<-nodeHeights(tree)[order(tree$edge[,2]),2]
  h<-c(h[1:n],0,h[(n+1):length(h)])
  M<-mrca(tree,full=anc.nodes)[c(1:n,anc.nodes*(n+2:tree$Nnode)),c(1:n,anc.nodes*(n+2:tree$Nnode))]
  C<-matrix(h[M],nrow(M),ncol(M))
  if(anc.nodes) rownames(C)<-colnames(C)<-c(tree$tip.label,n+2:tree$Nnode)
  else rownames(C)<-colnames(C)<-tree$tip.label
  return(C)
}


A couple of tricks in here:

1) Rather than calling dist.nodes(), this function uses the phytools function nodeHeights, which returns a matrix of the dimensions of tree$edge, containing the height above the root of every node, and is quite fast.

2) The function then uses mrca(...,full=T) in the ape package, which returns the MRCA of every pair of species and nodes. This essentially serves as an index to construct our VCV matrix from the node heights!

All of this is done with the following lines of code:

# this gets a vector of height by node number
h<-nodeHeights(tree)[order(tree$edge[,2]),2]
# this inserts a height of zero for the root node (node n+1 for n species)
h<-c(h[1:n],0,h[(n+1):length(h)])
# this is our index matrix
M<-mrca(tree,full=T)
# this is our VCV matrix
C<-matrix(h[M],nrow(M),ncol(M))


Let's compare computation time to vcv.phylo and our earlier version (vcvPhylo1):

> tree<-rtree(500)
> system.time(X<-vcv.phylo(tree)) # only tips
   user  system elapsed 
   2.20    0.00    2.22 
> system.time(Y<-vcvPhylo1(tree)) # tips & nodes, old version
   user  system elapsed 
   7.68    0.03    7.73 
> system.time(Z<-vcvPhylo2(tree)) # tips & nodes, new version
   user  system elapsed 
   3.49    0.05    3.54 
> all(Y==Z)
[1] TRUE
> all(X==Z[c(1:500),c(1:500)])
[1] TRUE


Cool! BTW - Happy Thanksgiving to all the blog readers.

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.