Friday, March 4, 2011

Prune tree to a list of taxa

Using the {ape} function drop.tip() we can easily excise a single taxon or a list of taxa from our "phylo" tree object in R. However, it is not immediately obvious how to prune the tree to include, rather than exclude, a specific list of tips. Trina Roberts (now at NESCent) shared a trick to do this with me some time ago, and I thought I'd pass it along to the readers of this blog.

First, let's start with a tree of 10 species:

> tree<-rtree(10)
> write.tree(tree)
[1] "(t8:0.22,((((t3:0.9,(t7:0.48,t2:0.5):0.12):0.47,t6:0.55):0.08,(t5:0.49,(t9:0.71,t10:0.13):0.15):0.7):0.87,(t1:0.72,t4:0.62):0.55):0.47);"

Now, say we want to keep the species t2, t4, t6, t8, and t10 in our pruned tree, we just put these tip names into a vector:

> species<-c("t2","t4","t6","t8","t10")

[More commonly, this vector will probably come from the row names in our data matrix, or we might read it from a text file.]

We create the pruned tree with one command:

> pruned.tree<-drop.tip(tree,tree$tip.label[-match(species, tree$tip.label)])

Now we have our pruned tree, as desired:

> write.tree(pruned.tree)
[1] "(t8:0.22,(((t2:1.09,t6:0.55):0.08,t10:0.98):0.87,t4:1.17):0.47);"


  1. If there are tips in the "species" vector that are not in the tree, match(species,tree$tip.label) will one or mulitple NAs, and the procedure will fail. To avoid this problem, one can just do:
    > pruned.tree<-drop.tip(tree, tree$tip.label[-na.omit(match(species, tree$tip.label))])

  2. Hi Liam-

    Even less code than the -match trick:

    pruned.tree<-drop.tip(tree, setdiff(tree$tip.label, species));

    setdiff is very handy....(as is intersect and %in%)

  3. Cool! Very handy indeed.

    Dan's method will also work even if some of the labels in "species" are not in "tree."

  4. Hey Liam, congratulations for this great blog, just one question related to this, suppose you want to prune a bunch of taxa from a large amount of trees (for example, you want to use the trees from the posterior of BEAST but you want to get rid of some taxa for all the trees). I am a beginner in R and I have no clue of how to do it.

    Any idea?

    thanks a lot!


  5. Hi. Yes, this is pretty easy. I will write a quick post about this now.

  6. Hi Liam,
    I have a bit of a tricky question. When I use the drop.tip function, the (dropped-tip) tree object will keep all elements from the (original) lists, and therefore these elements (e.g. 95% HPD or ancestral state reconstructions on nodes) do not match the new object anymore (i.e. they do not change along). Do you know a way to make these lists change according to the dropped tips and the new node numbers? I haven't been able to figure it out, unfortunately. Cheers and best wishes, Renske

    1. Yup. This is an issue. I have an idea, but it depends how the node information is stored in the 'phylo' object. If you email me your saved workspace (.Rdata) then I will investigate it. Or you could post your question to R-sig-phylo since it doesn't have anything in particular to do with phytools.

    2. Hi Renske and Liam, I have the same question and came across this post. Has there been any clarification on this topic? Thanks! All the best, Sara

    3. Hi Liam,

      I am still having the same issue in that when i trim my tree the list of node labels is not trimmed along with it. Do you know of a way to trim the node labels as well? I want to plot posterior probabilities in my phylo object onto my tree. Thanks!

    4. Hi Kayleigh,
      I solved it in this way. I used the drop.tip in the file with all the trees (if BEAST2 is used it'll be the one from LogCombiner after removing the burn-in). Then you just follow this other post:
      Then, you have a new file ready for TreeAnnotator which will calculate all you need.

  7. The package phyloch with its function read.beast and drop.tips2 used to do the trick… I'm trying to do it again and it doesnt work anymore, apparently due to a failure (hopefully easy to fix) of the function readBeasttable (embedded in read.beast) on last versions of .trees files…

    1. Or perhaps it is just because I used the BSSVS procedure that eliminates zero-rates in the rate matrix… Making the length of the "characters state" string inconsistent and thus preventing readBeasttable to properly parse the file…

  8. Is it possible to remove a set of labels matching some regular expression - say, a set of labels with a given prefix?

    1. Hi Steven.

      Yes - this is possible. I have posted about this here. Let me know if this is what you want to do.

      - Liam

  9. This comment has been removed by the author.

  10. A quick question: some of my species' names contain dot, underscore, and number. E.g; Species.n.sp_2. After performing prune; those species' names with dot, underscore and number are not included in the final result and that was my attention. I want to keep these taxa. Thanks for your time and help!

  11. This comment has been removed by the author.

  12. Thanks for this Liam! Is there a way to do this on a multiPhylo object? I tried:
    pruned.tree <- lapply(tree, drop.tip(tree,tree$tip.label[-match(species, tree$tip.label)]))

    and also Dan's approach:
    pruned.tree<-lapply(tree, drop.tip(tree, setdiff(tree$tip.label, species)))

    But keep getting this error: Error in drop.tip(tree, tree$tip.label[-match(species, tree$tip.label)]) :
    object "phy" is not of class "phylo"

    Any ideas?



    1. Sure, but you have to make the second argument in your lapply call a function. For instance:

      tips ## my tips to keep

      That should work.

      - Liam

    2. Perfect, Liam! It works great except when `tips` has just a few species. Then it complains with:

      Error in phy$edge[, 2] : incorrect number of dimensions
      In addition: Warning message:
      In drop.tip(t, setdiff(t$tip.label, tips)) :
      drop all tips of the tree: returning NULL

      It worked for what I needed it, just wanted to warn others who might find this later on.


  13. Hi Liam,
    I am trying to visualize species tree of 1200+ species with characters mapped on them and am unable to see it clearly due to large number of taxas. Could you please suggest some way to deal with it?
    I have already tried exporting it PNG and PDF formats but those didn't help.

  14. Thanks, Liam. This was really HANDY!

  15. Hi Liam,
    Thank you for your great blog. While I was dropping the selected taxa, faced such an error:
    In cbind(e1, e2, deparse.level = 0) :
    number of rows of result is not a multiple of vector length (arg 1)
    Would you please guide me? there might be a problem with the tree, but not sure and also do now know how to fix it. Thank you in advance.

    1. Hi again, Thank to your post in
      I could fix the error. Thanks Liam!

  16. Hi Liam, I have been trying to create an object using my pruned tree, and data. But I get error in ace(x, a, method= "pic": length of phenotype and of phylogenetic data do not match. Any idea of what to do?

    1. You could try the name.check( ) function in geiger to see if that tells you where the discrepancy is.

      x should be a named vector & a is your tree. You can try:


      This might also help show you were the problem lies.

  17. Hello,

    I am somewhat new to R, the code works great! I am wondering, are branch lengths are preserved this way?


Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.