An R-sig-phylo query asked the following:
I'd like to sample independent subclades of greater than N tips from a phylogenetic tree much greater than N, say (number of tips >= 10*N). I can extract all subclades of greater than N taxa, but sampling independent subclades seems to be a harder problem.
Well - I'm not completely sure if this what he's asking for, but here is a function that will get all the subtrees in a binary tree that cannot be further subdivided into two subtrees of size >= clade.size.
nn<-1:tree$Nnode+length(tree$tip.label)
ndesc<-function(tree,node){
x<-getDescendants(tree,node)
sum(x<=length(tree$tip.label))
}
dd<-setNames(sapply(nn,ndesc,tree=tree),nn)
aa<-nn[1]
nodes<-vector()
while(length(aa)){
bb<-sapply(aa,function(x,tree)
tree$edge[which(tree$edge[,1]==x),2],tree=tree)
cc<-matrix(dd[as.character(bb)],nrow(bb),ncol(bb))
cc[is.na(cc)]<-1
gg<-apply(cc,2,function(x,cs)
any(x < cs),cs=clade.size)
nodes<-c(nodes,aa[gg])
aa<-as.vector(bb[,!gg])
}
trees<-lapply(nodes,extract.clade,phy=tree)
class(trees)<-"multiPhylo"
return(trees)
}
The way this works is as follows. First, we get the number of tips descended from each node in the tree:
ndesc<-function(tree,node){
x<-getDescendants(tree,node)
sum(x<=length(tree$tip.label))
}
dd<-setNames(sapply(nn,ndesc,tree=tree),nn)
class(trees)<-"multiPhylo"
Right now, the function will only work on binary trees - but there is no check to assure that your tree is binary. I will fix that and add to phytools.
No comments:
Post a Comment
Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.