Thursday, May 10, 2012

"Mini" version of drop.tip

The phytools package can read, simulate, plot, reorder, and write stochastic character map style trees. (That is, trees in which a discrete character history is stored on the edges of the tree.) I even recently added the capacity to read SIMMAP v1.5 trees to the function read.simmap. Unfortunately, phytools only has a very limited capacity to manipulate SIMMAP style trees. In particular, the one existing function (drop.tip.simmap only alters the $mapped.edge element of the modified "phylo" object, without touching $maps. $mapped.edge contains only the time spent in each state on each edge, while $maps contains the times and order of each state on each edge. The latter element is required for both plotting (using plotSimmap) and for other types of analyses using (for instance) the new OUwie phylogenetics package.

Well, I'm at long last trying to build a function that will drop tips from the tree, but preserve both $maps and $mapped.edge. At first, I was hoping to do this using basically the same trick (but in reverse) that I used to read SIMMAP trees from file in the first version of read.simmap (this trick has long been replaced by a much more sophisticated function). Alas, this did not work. So I have gone back to the basics and I have started out by writing a basic tip dropping function of my own. This is incredibly simple, and much less versatile than the ape function drop.tip, but it gives me a structure to work from to build up a full drop.tip.simmap function.

The function is as follows:

pruneTree<-function(tree,tip){
  tip<-which(tree$tip.label%in%tip)
  edges<-match(tip,tree$edge[,2])
  tree$edge<-tree$edge[setdiff(1:nrow(tree$edge),edges),]
  z<-setdiff(tree$edge[,2],tree$edge[,1])
  z<-z[z>length(tree$tip)]
  while(length(z)>0){
    edges<-match(z,tree$edge[,2])
    tree$edge<-tree$edge[setdiff(1:nrow(tree$edge),edges),]
    z<-setdiff(tree$edge[,2],tree$edge[,1])
    z<-z[z>length(tree$tip)]
  }
  z<-setdiff(tree$edge[,2],tree$edge[,1])
  tree$tip.label<-tree$tip.label[z]
  tree$edge[which(tree$edge[,2]%in%z),2]<-1:length(tree$tip)
  i<-1
  while(i < nrow(tree$edge)){
    single<-sum(tree$edge[i,2]==tree$edge[,1])==1
    while(single){
      if(sum(tree$edge[i,2]==tree$edge[,1])==1){
        z<-match(tree$edge[i,2],tree$edge[,1])
        tree$edge[i,2]<-tree$edge[z,2]
        tree$edge<-tree$edge[setdiff(1:nrow(tree$edge),z),]
      }
      single<-sum(tree$edge[i,2]==tree$edge[,1])==1
    }
    i<-i+1
  }
  z<-unique(as.vector(tree$edge))
  z<-z[z>length(tree$tip)]
  y<-order(z)+length(tree$tip)
  for(i in 1:length(z)) tree$edge[tree$edge==z[i]]<-y[i]
  tree$Nnode<-max(tree$edge)-length(tree$tip)
  tree$edge.length<-NULL
  class(tree)<-"phylo"
  return(tree)
}


OK, let's try it:

> require(phytools); source("pruneTree.R")
> set.seed(1)
> tree<-pbtree(n=20); tree$edge.length<-NULL
> plot(tree)
> ptree<-pruneTree(tree,c("t6","t8","t7","t3","t2","t15"))
> plot(ptree)


Now I just need to figure out how to preserve the mappings and I'll be all set!

4 comments:

  1. OK, the next step - preserving edge lengths when available - turns out to have been easier than expected. I have added that and next I will figure out how to preserve maps.

    ReplyDelete
  2. In addition, the function has a bug under some circumstances that I'm trying to figure out. A work in progress!

    ReplyDelete
  3. Hi Liam! Thank you so much for finding the time to try and figure this out. It really seems a pretty good amount of work!

    ReplyDelete
  4. Thanks for the comments, Rafael. I have now uploaded a new version of drop.tip.simmap (described here. Let me know how it works!

    ReplyDelete