Sunday, August 10, 2014

Remove a set of tips matching a regular expression

“Is it possible to remove a set of labels matching some regular expression - say, a set of labels with a given prefix?”

The answer is that this is a piece of cake in R using two handy functions: grep in the R base package; and drop.tip in ape. Here's how we do it.

First, let's make up a tree with tip labels that look realistic:

library(phytools)

## Loading required package: ape

labels<-paste(LETTERS[26:1],"._",sapply(1:26,function(i)
paste(sample(letters,6),collapse="")),sep="")
tree<-pbtree(n=26,tip.label=labels)


Now, let's add the tags randomly of "(NA)" for our fictional “species” that occur in North America; and "(SA)" for species in South America. I'm going do this randomly with probability of 0.25 that a species is from North America & probability of 0.75 that each species is in South America.

continent<-sapply(runif(n=26),function(x) if(x<0.25)
"(NA)" else "(SA)")
continent

##  [1] "(SA)" "(SA)" "(NA)" "(SA)" "(SA)" "(SA)" "(NA)" "(SA)" "(SA)" "(SA)"
## [11] "(SA)" "(SA)" "(NA)" "(SA)" "(SA)" "(SA)" "(SA)" "(SA)" "(NA)" "(SA)"
## [21] "(SA)" "(NA)" "(SA)" "(NA)" "(NA)" "(SA)"

tree$tip.label<-paste(tree$tip.label,"_",continent,sep="")
plotTree(tree)


Finally, let's remove all taxa from our tree that are from North America:

tree<-drop.tip(tree,tree$tip.label[grep("(NA)",tree$tip.label)])
plotTree(tree)


That's it.

1 comment:

1. Excellent. Just what I was looking for. Thanks for the explanation.

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.