Friday, January 29, 2016

phylo.heatmap standardizing by column

In tinkering with the visualization function I just added to phytools to plot a phylogenetic heat map, I realized that an important limitation of the function as written was that all columns of X are plotted on the same scale. That means that for traits with low among-species variability, it might be difficult to detect otherwise strong phylogenetic patterns in the data.

For example, here I will simulate a tree & then trait data for 10 traits with various values of σ.

library(phytools)
tree<-rcoal(n=30)
X<-sapply(setNames((1:10)^2,paste("sig=",1:10,sep="")),
    function(sig2,tree) fastBM(tree,sig2=sig2),tree=tree)
phylo.heatmap(tree,X)

plot of chunk unnamed-chunk-1

The problem with this is fairly obvious. Traits with low σ show little evidence for phylogenetic structure because it is just not captured by the color ramp that is specified.

An easy solution to this is to first standardize the columns to have the same variance and then re-run the analysis. Here is what that looks like:

phylo.heatmap(tree,X,standardize=TRUE)

plot of chunk unnamed-chunk-2

I also added some other updates to the legend placement & the labels. The update can be seen here or installed directly from GitHub using detools.

6 comments:

  1. Hi Liam,
    Thanks for putting this together. I've started playing with this to visualize a lot of variables (10-40) with a tree consisting of 100+ tips. This tool seems like an easy way to get through this, but I have had a little difficulty getting the final output formatted properly. Is there a specific way to specify the amount of space allocated to the 1) tree 2) tip labels, and 3) heatmap. fsize= alters the tip label size, but I can't find a way to set min and max limits for the tree or heatmap.
    Thanks
    Jeff

    ReplyDelete
    Replies
    1. Hi Jeff.

      I just added some more user control of the layout, etc. Here is what I mean: http://blog.phytools.org/2016/02/more-options-user-control-for.html. Let me know if this is what you had in mind.

      All the best, Liam

      Delete
    2. Hi Liam,

      Thanks - that's exactly what I had in mind, and its helping a lot already. An additional factor I've noticed is that as input trees get larger, the dots connecting tips to tip labels occupy a larger proportion of space. Not really important though, just a small point.

      Best,
      Jeff

      Delete
  2. Hi Liam,

    I have one other question/request. Some of my data has missing values, but I would still like to visually represent the patterns in the remaining data. When specifying standardize=FALSE, the plotting function works properly, except the variable names and legend text are omitted. With standardize=TRUE, the heatmap isn't generated at all. This can be bypassed by scaling beforehand, but that still leaves this issue of column titles and legend text.

    Thanks again
    Jeff

    ReplyDelete
    Replies
    1. Hi Jeff.

      This should be addressed. Check it out here, and let me know if this is what you had in mind.

      All the best, Liam

      Delete