Friday, March 4, 2011

Prune tree to a list of taxa

Using the {ape} function drop.tip() we can easily excise a single taxon or a list of taxa from our "phylo" tree object in R. However, it is not immediately obvious how to prune the tree to include, rather than exclude, a specific list of tips. Trina Roberts (now at NESCent) shared a trick to do this with me some time ago, and I thought I'd pass it along to the readers of this blog.

First, let's start with a tree of 10 species:

> tree<-rtree(10)
> write.tree(tree)
[1] "(t8:0.22,((((t3:0.9,(t7:0.48,t2:0.5):0.12):0.47,t6:0.55):0.08,(t5:0.49,(t9:0.71,t10:0.13):0.15):0.7):0.87,(t1:0.72,t4:0.62):0.55):0.47);"


Now, say we want to keep the species t2, t4, t6, t8, and t10 in our pruned tree, we just put these tip names into a vector:

> species<-c("t2","t4","t6","t8","t10")

[More commonly, this vector will probably come from the row names in our data matrix, or we might read it from a text file.]

We create the pruned tree with one command:

> pruned.tree<-drop.tip(tree,tree$tip.label[-match(species, tree$tip.label)])

Now we have our pruned tree, as desired:

> write.tree(pruned.tree)
[1] "(t8:0.22,(((t2:1.09,t6:0.55):0.08,t10:0.98):0.87,t4:1.17):0.47);"

20 comments:

  1. If there are tips in the "species" vector that are not in the tree, match(species,tree$tip.label) will one or mulitple NAs, and the procedure will fail. To avoid this problem, one can just do:
    > pruned.tree<-drop.tip(tree, tree$tip.label[-na.omit(match(species, tree$tip.label))])

    ReplyDelete
  2. Hi Liam-

    Even less code than the -match trick:

    pruned.tree<-drop.tip(tree, setdiff(tree$tip.label, species));

    setdiff is very handy....(as is intersect and %in%)

    ReplyDelete
  3. Cool! Very handy indeed.

    Dan's method will also work even if some of the labels in "species" are not in "tree."

    ReplyDelete
  4. Hey Liam, congratulations for this great blog, just one question related to this, suppose you want to prune a bunch of taxa from a large amount of trees (for example, you want to use the trees from the posterior of BEAST but you want to get rid of some taxa for all the trees). I am a beginner in R and I have no clue of how to do it.

    Any idea?

    thanks a lot!

    J.

    ReplyDelete
  5. Hi. Yes, this is pretty easy. I will write a quick post about this now.

    ReplyDelete
  6. Hi Liam,
    I have a bit of a tricky question. When I use the drop.tip function, the (dropped-tip) tree object will keep all elements from the (original) lists, and therefore these elements (e.g. 95% HPD or ancestral state reconstructions on nodes) do not match the new object anymore (i.e. they do not change along). Do you know a way to make these lists change according to the dropped tips and the new node numbers? I haven't been able to figure it out, unfortunately. Cheers and best wishes, Renske

    ReplyDelete
    Replies
    1. Yup. This is an issue. I have an idea, but it depends how the node information is stored in the 'phylo' object. If you email me your saved workspace (.Rdata) then I will investigate it. Or you could post your question to R-sig-phylo since it doesn't have anything in particular to do with phytools.

      Delete
    2. Hi Renske and Liam, I have the same question and came across this post. Has there been any clarification on this topic? Thanks! All the best, Sara

      Delete
  7. The package phyloch with its function read.beast and drop.tips2 used to do the trick… I'm trying to do it again and it doesnt work anymore, apparently due to a failure (hopefully easy to fix) of the function readBeasttable (embedded in read.beast) on last versions of .trees files…

    ReplyDelete
    Replies
    1. Or perhaps it is just because I used the BSSVS procedure that eliminates zero-rates in the rate matrix… Making the length of the "characters state" string inconsistent and thus preventing readBeasttable to properly parse the file…

      Delete
  8. Is it possible to remove a set of labels matching some regular expression - say, a set of labels with a given prefix?

    ReplyDelete
    Replies
    1. Hi Steven.

      Yes - this is possible. I have posted about this here. Let me know if this is what you want to do.

      - Liam

      Delete
  9. This comment has been removed by the author.

    ReplyDelete
  10. A quick question: some of my species' names contain dot, underscore, and number. E.g; Species.n.sp_2. After performing prune; those species' names with dot, underscore and number are not included in the final result and that was my attention. I want to keep these taxa. Thanks for your time and help!

    ReplyDelete
  11. This comment has been removed by the author.

    ReplyDelete
  12. Thanks for this Liam! Is there a way to do this on a multiPhylo object? I tried:
    pruned.tree <- lapply(tree, drop.tip(tree,tree$tip.label[-match(species, tree$tip.label)]))

    and also Dan's approach:
    pruned.tree<-lapply(tree, drop.tip(tree, setdiff(tree$tip.label, species)))

    But keep getting this error: Error in drop.tip(tree, tree$tip.label[-match(species, tree$tip.label)]) :
    object "phy" is not of class "phylo"

    Any ideas?

    Thanks!

    Simon

    ReplyDelete
    Replies
    1. Sure, but you have to make the second argument in your lapply call a function. For instance:

      tips ## my tips to keep
      pruned<-lapply(trees,function(t,tips)
      drop.tip(t,setdiff(t$tip.label,tips)),tips=tips)
      class(pruned)<-"multiPhylo"

      That should work.

      - Liam

      Delete
    2. Perfect, Liam! It works great except when `tips` has just a few species. Then it complains with:

      Error in phy$edge[, 2] : incorrect number of dimensions
      In addition: Warning message:
      In drop.tip(t, setdiff(t$tip.label, tips)) :
      drop all tips of the tree: returning NULL

      It worked for what I needed it, just wanted to warn others who might find this later on.

      Simon

      Delete
  13. Hi Liam,
    I am trying to visualize species tree of 1200+ species with characters mapped on them and am unable to see it clearly due to large number of taxas. Could you please suggest some way to deal with it?
    I have already tried exporting it PNG and PDF formats but those didn't help.

    ReplyDelete