Sunday, August 10, 2014

Remove a set of tips matching a regular expression

A phytools reader asks the following question:

“Is it possible to remove a set of labels matching some regular expression - say, a set of labels with a given prefix?”

The answer is that this is a piece of cake in R using two handy functions: grep in the R base package; and drop.tip in ape. Here's how we do it.

First, let's make up a tree with tip labels that look realistic:

library(phytools)
## Loading required package: ape
## Loading required package: maps
labels<-paste(LETTERS[26:1],"._",sapply(1:26,function(i) 
    paste(sample(letters,6),collapse="")),sep="")
tree<-pbtree(n=26,tip.label=labels)

Now, let's add the tags randomly of "(NA)" for our fictional “species” that occur in North America; and "(SA)" for species in South America. I'm going do this randomly with probability of 0.25 that a species is from North America & probability of 0.75 that each species is in South America.

continent<-sapply(runif(n=26),function(x) if(x<0.25) 
    "(NA)" else "(SA)")
continent
##  [1] "(SA)" "(SA)" "(NA)" "(SA)" "(SA)" "(SA)" "(NA)" "(SA)" "(SA)" "(SA)"
## [11] "(SA)" "(SA)" "(NA)" "(SA)" "(SA)" "(SA)" "(SA)" "(SA)" "(NA)" "(SA)"
## [21] "(SA)" "(NA)" "(SA)" "(NA)" "(NA)" "(SA)"
tree$tip.label<-paste(tree$tip.label,"_",continent,sep="")
plotTree(tree)

plot of chunk unnamed-chunk-2

Finally, let's remove all taxa from our tree that are from North America:

tree<-drop.tip(tree,tree$tip.label[grep("(NA)",tree$tip.label)])
plotTree(tree)

plot of chunk unnamed-chunk-3

That's it.

1 comment:

  1. Excellent. Just what I was looking for. Thanks for the explanation.

    ReplyDelete

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.