Tuesday, May 21, 2013

Extracting all subtrees of a given size & larger

An R-sig-phylo query asked the following:

I'd like to sample independent subclades of greater than N tips from a phylogenetic tree much greater than N, say (number of tips >= 10*N). I can extract all subclades of greater than N taxa, but sampling independent subclades seems to be a harder problem.

Well - I'm not completely sure if this what he's asking for, but here is a function that will get all the subtrees in a binary tree that cannot be further subdivided into two subtrees of size >= clade.size.

getCladesofSize<-function(tree,clade.size=2){
  nn<-1:tree$Nnode+length(tree$tip.label)
  ndesc<-function(tree,node){
    x<-getDescendants(tree,node)
    sum(x<=length(tree$tip.label))
  }
  dd<-setNames(sapply(nn,ndesc,tree=tree),nn)
  aa<-nn[1]
  nodes<-vector()
  while(length(aa)){
    bb<-sapply(aa,function(x,tree)
      tree$edge[which(tree$edge[,1]==x),2],tree=tree)
    cc<-matrix(dd[as.character(bb)],nrow(bb),ncol(bb))
    cc[is.na(cc)]<-1
    gg<-apply(cc,2,function(x,cs)
      any(x < cs),cs=clade.size)
    nodes<-c(nodes,aa[gg])
    aa<-as.vector(bb[,!gg])
  }
  trees<-lapply(nodes,extract.clade,phy=tree)
  class(trees)<-"multiPhylo"
  return(trees)
}

The way this works is as follows. First, we get the number of tips descended from each node in the tree:

nn<-1:tree$Nnode+length(tree$tip.label)
ndesc<-function(tree,node){
  x<-getDescendants(tree,node)
  sum(x<=length(tree$tip.label))
}
dd<-setNames(sapply(nn,ndesc,tree=tree),nn)
Next, we conduct a traversal of the tree from the root. If a node has two daughter clades both of size >= clade.size, then we ascend up the tree to those nodes. If not, then we store the value of the current node. At the end, we extract all subtrees whose node numbers we have stored:
trees<-lapply(nodes,extract.clade,phy=tree)
class(trees)<-"multiPhylo"
That's it.

Right now, the function will only work on binary trees - but there is no check to assure that your tree is binary. I will fix that and add to phytools.

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.