Tuesday, September 20, 2011

Dropping the same set of taxa from a set of trees

In response to an earlier post, a reader just inquired as to whether there was a simple way to drop the same set of terminal species from a set of trees - say a sample of trees from the posterior distribution. In fact, this can be done a couple of different ways.

A set of trees read by the "ape" functions read.tree() or read.nexus() are stored in memory as a object of class "multiPhylo". This is merely a list of objects of class "phylo" with the class "multiPhylo" assigned. First, let's generate a simple "multiPhylo" object composed of three ten-taxon trees:

> require(ape)
> trees<-list(rtree(10),rtree(10),rtree(10))
> class(trees)<-"multiPhylo"


Normally this object would be created (with class assigned) by read.tree() or read.nexus(), but for the purposes of illustration, this will do.

Next, let's say we want to prune species "t6" and "t8" from all of the trees in the list. We can do this for any individual tree using drop.tip(), e.g.:

> pruned.tree<-drop.tip(trees[[1]],c("t6","t8"))

We can prune these taxa all at once from all the trees in our set using the R "base" function lapply(). Here we would just do:

> pruned.trees<-lapply(trees,drop.tip,tip=c("t6","t8"))
> class(pruned.trees)<-"multiPhylo"


Assigning the class is necessary because lapply() returns a generic list with no class assigned. Now if we type:

> plot(pruned.trees)

we see that we have indeed pruned the desired taxa from each tree in our set. For a longer list, we can easily read the list in from a file, and then send the result object to lapply(...,drop.tip,...).

Note that this can also be done with a simple for() loop, e.g.:

> pruned.trees<-list(); class(pruned.trees)<-"multiPhylo"
> for(i in 1:length(trees))
pruned.trees[[i]]<-drop.tip(trees[[i]],c("t6","t8"))


This is more intuitive, particularly for people experienced in other programming languages like C, but will usually run slower in R.

5 comments:

  1. This worked really well for me. I needed to remove around 250 out of 1000 tips from 200 trees. In my case all of the taxa that I wanted to remove were in a single group so i just used the taxon numbers from the nexus file and it worked great.

    pruned.trees<-lapply(trees,drop.tip,4:254)

    ReplyDelete
  2. It also worked for me well.

    Thanks!

    Kai

    ReplyDelete
  3. Hi Liam,
    I have a question. I have a very large sets of trees, however, the excluded species for each tree are different. Is there anyway that I can run the program automatically on directory to read through each excluded species file and make a new tree without that species based on the default tree?

    ReplyDelete
  4. Hi Liam,

    wondering how would this change if I want to drop let's say 20 tips from the multiphylo object on random?


    pruned.trees<- lapply(trees,drop.tip,tip=c("t6","t8")
    class(pruned.trees)<-"multiPhylo"

    cheers

    ReplyDelete