Tuesday, February 21, 2012

Adding spaces between genus & specific names in tree objects

Newick format typically excludes spaces within species & node labels on the tree; however we would often like to plot binomial species names with a space between them. This is done by default in ape's plot.phylo if the "_" character is used in lieu of a space. For instance:

> tree<-read.tree(text="((((Anolis_evermanni:0.21,Anolis_stratulus:0.21):0.35,(((Anolis_krugi:0.33,Anolis_pulchellus:0.33):0.13,(Anolis_gundlachi:0.39,Anolis_poncensis:0.39):0.07):0.03,(Anolis_cooki:0.4,Anolis_cristatellus:0.4):0.09):0.08):0.39,Anolis_cuvieri:0.96):0.04,Anolis_occultus:1);")
> plot.phylo(tree)

This does not work with phytools functions plotTree and plotSimmap, in which the whole string is written. However, there is no constraint on "phylo" objects in R requiring that tip labels lack spaces, so we can add them if we want before plotting.

> tree$tip.label<-gsub("_"," ",tree$tip.label)
> plotTree(tree,node.numbers=T,pts=F,ftype="i")

For fun, let's paint the Puerto Rican Anolis ecomorphs on the tree - arbitrarily assigning the stem branches to the state of the descendant node, and the root state to "TG" (i.e., the trunk-ground ecomorph):

> tree<-paintSubTree(tree,11,"TG")
> tree<-paintSubTree(tree,which(tree$tip.label=="Anolis occultus"), "Tw",stem=T)
> tree<-paintSubTree(tree,which(tree$tip.label=="Anolis cuvieri"), "CG",stem=T)
> tree<-paintSubTree(tree,which(tree$tip.label=="Anolis poncensis"), "GB",stem=T)
> tree<-paintSubTree(tree,17,"GB",stem=T)
> tree<-paintSubTree(tree,14,"TC",stem=T)
> eco<-c("TG","Tw","CG","GB","TC")
> cols<-c("red","grey","green","yellow","blue")
> names(cols)<-eco
> plotSimmap(tree,cols,pts=F,lwd=4,ftype="i")



  1. Of course, we can also go back if we want to write the tree to file:

    tree$tip.label<-gsub(" ","_",tree$tip.label)
    write.tree(tree,file="treefile.tre") # or

  2. Couldn't you just make the gsub() line internal to plotTree, right before the actual plotting the labels takes place? It seems like, from a programming point of view, that might be the safest course of action as people would then not be at risk of changing their tree object just to plot it and then forgetting this later in an analytical procedure.

    (This of course assumes that no one, ever, would want to plot underscores in tip labels. I would bet cash against the possibility, though.)

    Either way, thanks for the tip! I had never played around with gsub() or any of the string manipulation functions much before.
    -Dave B.

    PS: I have a library up on CRAN too now! :)

  3. Yes, that sounds like a good idea. Maybe I will put that into the next version of plotSimmap (and phytools).

    Why don't you post a link to your package so we can check it out.

    Thanks! Liam

  4. Sure thing, Liam!


    My package is paleotree and it is focused on offering useful functions for paleontological data, particularly when phylogenetic approaches are used in the fossil record. Although there are a number of functions with miscellaneous uses, it focuses on four things: (1) estimating sampling rates from taxonomic ranges in the fossil record, (2) simulating taxon ranges and converting ranges into a phylogeny, (3) time-scaling cladograms of fossil taxa and (4) plotting diversity curves from fossil data in a way that is comparable to lineage diversity curves from phylogenies.