Saturday, March 9, 2013

Significant update to phenogram

The phytools function phenogram does a projection of the phylogeny into a space defined by time since the root (on the x axis) and phenotype (on y). It has some nice features, for instance it can map the state of a discrete character on the tree, but it also had a couple of small bugs associated with labeling the leaves - specifically, the alignment of tip labels is messed up, and it sometimes did not leave enough whitespace right of the tips for labels to be printed.

I decided to do a significant overhaul of phenogram to both try and fix these issues as well as to enable a lot more user control of plotting within the function.

Fixing the text alignment was a piece of cake; however allocating enough whitespace for plotting tip labels turned out to be a much more complicated issue. This is not something that I've really dealt with in tree plotting before because in my other major tree plotting function, plotSimmap, the function circumvents the issue by fixing the plotting area to a unit in length, and then fractioning that area into a part for the tree and a second part for tip labels. (This works really well, but introduces complications when you want to include a legend - e.g., here.)

Let me try to explain why this (probably) seemingly trivial issue can be such a pain in the butt. Now, we have a function strwidth which will give us the width (in various units) of a plotted string. The difficulty arises if we want to tie the limits of a plotting area to a call of strwidth. This is because strwidth(...,units="user") (the default) will only work properly after our plotting device has been opened. This means that it can't be used to specify the dimensions of the plotting area - paradoxical if the point of specifying a specific set of dimensions for plotting is specifically to leave space for plotted strings! The solution* turns out to be first pull the horizontal dimension in inches of our plotting device out using par("pin"); then finding the maximum width of our tip labels (again, in inches) on the plotting devices; and then, finally, using numerical optimization to find the ratio of our units (time since the root, in this case) and inches that allows us to use the whole plotting devices when the tip labels are also plotted. The way this looks in practice is as follows:
# node heights
H<-nodeHeights(tree)
# width of the plotting device, in inches
pp<-par("pin")[1]
# string width on the plotting device, in inches
# (includes label offset)
sw<-fsize*(max(strwidth(tree$tip.label,units="inches")))+   offset*fsize*max(strwidth(tree$tip.label,
  units="inches"))/max(nchar(tree$tip.label))
# find the ratio of inches:units that fills the
# plotting window
alp<-optimize(function(a,H,sw,pp)
  (a*1.04*max(H)+sw-pp)^2,
  H=H,sw=sw,pp=pp,interval=c(0,1e6))$minimum
# set x-limits
xlim<-c(min(H),max(H)+sw/alp)

(*This solution is derived from something that Emmanuel Paradis did in the ape function plot.phylo. Thanks Emmanuel!)

While I was at it, I decided to migrate a lot of other controls over plotting to the user. This will be in the function documentation for the next phytools version, but here's a quick demo:
> source("phenogram.R")
> tree<-pbtree(n=20)
> x<-exp(fastBM(tree))
> phenogram(tree,x,log="y",colors="blue",type="b", offset=0.5,xlab="millions of years",ylab="body size", main="Body Size Evolution")

The source code for the new version of phenogram is here. It will also be updated in the next version of phytools.

2 comments:

  1. Cool stuff Liam. Have you considered adding an option to plot 95% confidence intervals for the ancestral trait reconstructions? For what it's worth, I did this once using "segments" and it looked a bit messy (lots of lines, and hard to interpret when there are many nodes in a short time interval), but it was nonetheless pretty useful.

    ReplyDelete