Sunday, August 10, 2014

Remove a set of tips matching a regular expression

A phytools reader asks the following question:

“Is it possible to remove a set of labels matching some regular expression - say, a set of labels with a given prefix?”

The answer is that this is a piece of cake in R using two handy functions: grep in the R base package; and drop.tip in ape. Here's how we do it.

First, let's make up a tree with tip labels that look realistic:

## Loading required package: ape
## Loading required package: maps

Now, let's add the tags randomly of "(NA)" for our fictional “species” that occur in North America; and "(SA)" for species in South America. I'm going do this randomly with probability of 0.25 that a species is from North America & probability of 0.75 that each species is in South America.

continent<-sapply(runif(n=26),function(x) if(x<0.25) 
    "(NA)" else "(SA)")
##  [1] "(SA)" "(SA)" "(NA)" "(SA)" "(SA)" "(SA)" "(NA)" "(SA)" "(SA)" "(SA)"
## [11] "(SA)" "(SA)" "(NA)" "(SA)" "(SA)" "(SA)" "(SA)" "(SA)" "(NA)" "(SA)"
## [21] "(SA)" "(NA)" "(SA)" "(NA)" "(NA)" "(SA)"

Finally, let's remove all taxa from our tree that are from North America:


That's it.

  1. Excellent. Just what I was looking for. Thanks for the explanation.


