Monday, October 1, 2012

Fix for fastAnc and new function to match nodes

Ok, firstly, the function that I introduced last night for fast estimation the ML ancestral states for internal nodes on the tree (fastAnc) had a bug that would sometimes cause it to return one or multiple NAs if the tree had multifurcations. This was obviously due to some problem in my algorithm for matching nodes between trees. I have resolved this problem by creating a new, standalone utility function to match nodes between trees (called matchNodes. Here a matching pair of nodes is defined by sharing exactly all (i.e., no more & no less) of the same descendant leaves (regardless of intervening topology and branch lengths). The function looks as follows:

matchNodes<-function(tr1,tr2){
  desc.tr1<-lapply(1:tr1$Nnode+length(tr1$tip),
    function(x) extract.clade(tr1,x)$tip.label)
  names(desc.tr1)<-1:tr1$Nnode+length(tr1$tip)
  desc.tr2<-lapply(1:tr2$Nnode+length(tr2$tip),
    function(x) extract.clade(tr2,x)$tip.label)
  names(desc.tr2)<-1:tr2$Nnode+length(tr2$tip)
  Nodes<-matrix(NA,length(desc.tr1),2,dimnames= list(NULL,c("tr1","tr2")))
  for(i in 1:length(desc.tr1)){
    Nodes[i,1]<-as.numeric(names(desc.tr1)[i])
    for(j in 1:length(desc.tr2))
      if(all(desc.tr1[[i]]%in%desc.tr2[[j]]) && all(desc.tr2[[j]]%in%desc.tr1[[i]]))
        Nodes[i,2]<-as.numeric(names(desc.tr2)[j])
  }
  return(Nodes)
}

The top part of this code pulls out two lists of vectors containing the set of leaves descending from each node in each of the two input trees. The second part asks if - for each pair of nodes - all (and exactly all) of the descendant leaves are shared in common. The function returns a matrix containing the node numbers of tr1 in column 1, and their corresponding matches in column 2. Note, then, that the dimensions of the matrix are defined by the number of nodes in the first input tree. Mismatches in the other direction (i.e., nodes found in tree 2, but not in tree 1) won't be identified (but could be by another call to the function in which the argument order was flipped). Code for this function is here.

fastAnc is now highly simplified, as follows:

fastAnc<-function(tree,x){
  if(!is.binary.tree(tree)) btree<-multi2di(tree)
  else btree<-tree
  M<-btree$Nnode
  N<-length(btree$tip)
  anc<-vector()
  for(i in 1:M+N){
    anc[i-N]<-ace(x,multi2di(root(btree,node=i)),
      method="pic")$ace[1]
    names(anc)[i-N]<-i
  }
  if(!is.binary.tree(tree)){
    ancNames<-matchNodes(tree,btree)
    anc<-anc[as.character(ancNames[,2])]
    names(anc)<-ancNames[,1]
  }
  return(anc)
}

I have also made some additional changes to phenogram, phylomorphospace, and phylomorphospace3d, so that they now call fastAnc instead of ace or (worse) anc.ML. I have built this updates into a new non-CRAN version of phytools (version 0.2-01; I know, I just released a new CRAN update of phytools - sorry!) which can be downloaded here and installed from source:

> install.packages("phytools_0.2-01.tar.gz",type="source",
+ repos=NULL)
* installing *source* package 'phytools' ...
** R
...
* DONE (phytools)

No comments:

Post a Comment