Friday, March 4, 2011

Prune tree to a list of taxa

Using the {ape} function drop.tip() we can easily excise a single taxon or a list of taxa from our "phylo" tree object in R. However, it is not immediately obvious how to prune the tree to include, rather than exclude, a specific list of tips. Trina Roberts (now at NESCent) shared a trick to do this with me some time ago, and I thought I'd pass it along to the readers of this blog.

First, let's start with a tree of 10 species:

> tree<-rtree(10)
> write.tree(tree)
[1] "(t8:0.22,((((t3:0.9,(t7:0.48,t2:0.5):0.12):0.47,t6:0.55):0.08,(t5:0.49,(t9:0.71,t10:0.13):0.15):0.7):0.87,(t1:0.72,t4:0.62):0.55):0.47);"

Now, say we want to keep the species t2, t4, t6, t8, and t10 in our pruned tree, we just put these tip names into a vector:

> species<-c("t2","t4","t6","t8","t10")

[More commonly, this vector will probably come from the row names in our data matrix, or we might read it from a text file.]

We create the pruned tree with one command:

> pruned.tree<-drop.tip(tree,tree$tip.label[-match(species, tree$tip.label)])

Now we have our pruned tree, as desired:

> write.tree(pruned.tree)
[1] "(t8:0.22,(((t2:1.09,t6:0.55):0.08,t10:0.98):0.87,t4:1.17):0.47);"


  1. If there are tips in the "species" vector that are not in the tree, match(species,tree$tip.label) will one or mulitple NAs, and the procedure will fail. To avoid this problem, one can just do:
    > pruned.tree<-drop.tip(tree, tree$tip.label[-na.omit(match(species, tree$tip.label))])

  2. Hi Liam-

    Even less code than the -match trick:

    pruned.tree<-drop.tip(tree, setdiff(tree$tip.label, species));

    setdiff is very handy....(as is intersect and %in%)

  3. Cool! Very handy indeed.

    Dan's method will also work even if some of the labels in "species" are not in "tree."

  4. Hey Liam, congratulations for this great blog, just one question related to this, suppose you want to prune a bunch of taxa from a large amount of trees (for example, you want to use the trees from the posterior of BEAST but you want to get rid of some taxa for all the trees). I am a beginner in R and I have no clue of how to do it.

    Any idea?

    thanks a lot!


  5. Hi. Yes, this is pretty easy. I will write a quick post about this now.

  6. Hi Liam,
    I have a bit of a tricky question. When I use the drop.tip function, the (dropped-tip) tree object will keep all elements from the (original) lists, and therefore these elements (e.g. 95% HPD or ancestral state reconstructions on nodes) do not match the new object anymore (i.e. they do not change along). Do you know a way to make these lists change according to the dropped tips and the new node numbers? I haven't been able to figure it out, unfortunately. Cheers and best wishes, Renske

    1. Yup. This is an issue. I have an idea, but it depends how the node information is stored in the 'phylo' object. If you email me your saved workspace (.Rdata) then I will investigate it. Or you could post your question to R-sig-phylo since it doesn't have anything in particular to do with phytools.

    2. Hi Renske and Liam, I have the same question and came across this post. Has there been any clarification on this topic? Thanks! All the best, Sara

  7. The package phyloch with its function read.beast and drop.tips2 used to do the trick… I'm trying to do it again and it doesnt work anymore, apparently due to a failure (hopefully easy to fix) of the function readBeasttable (embedded in read.beast) on last versions of .trees files…

    1. Or perhaps it is just because I used the BSSVS procedure that eliminates zero-rates in the rate matrix… Making the length of the "characters state" string inconsistent and thus preventing readBeasttable to properly parse the file…

  8. Is it possible to remove a set of labels matching some regular expression - say, a set of labels with a given prefix?

    1. Hi Steven.

      Yes - this is possible. I have posted about this here. Let me know if this is what you want to do.

      - Liam

  9. This comment has been removed by the author.

  10. A quick question: some of my species' names contain dot, underscore, and number. E.g; Species.n.sp_2. After performing prune; those species' names with dot, underscore and number are not included in the final result and that was my attention. I want to keep these taxa. Thanks for your time and help!

  11. This comment has been removed by the author.