Monday, February 27, 2017

Function to add error bars to a contMap plot

I just wrote a function to add error bars to a "contMap" plot, as discussed today on R-sig-phylo and my blog.

The function looks as follows:

errorbar.contMap<-function(obj,...){
    if(hasArg(x)) x<-list(...)$x
    else x<-setNames(sapply(1:Ntip(obj$tree),function(x,obj){
        ii<-which(obj$tree$edge[,2]==x)
        ss<-names(obj$tree$maps[[ii]][length(obj$tree$maps[[ii]])])
        obj$lims[1]+as.numeric(ss)/(length(obj$cols)-1)*diff(obj$lims)
        },obj=obj),obj$tree$tip.label)
    if(hasArg(scale.by.ci)) scale.by.ci<-list(...)$scale.by.ci
    else scale.by.ci<-FALSE
    tree<-obj$tree
    aa<-fastAnc(tree,x,CI=TRUE)
    xlim<-range(aa$CI95)
    if(xlim[2]>obj$lims[2]||xlim[1]<obj$lims[1]){
        cat(paste("  -----\n  The range of the contMap object, presently (",
            round(obj$lims[1],4),",",
            round(obj$lims[2],4),
            "), should be equal to\n  or greater than the range of the CIs on ancestral states: (",
            round(xlim[1],4),",",round(xlim[2],4),").\n",sep=""))
        cat(paste("  To ensure that your error bars are correctly plotted, please recompute your\n", 
            "  contMap object and increase lims.\n  -----\n",sep=""))
    }
    d<-diff(obj$lims)
    if(scale.by.ci){
        v<-aa$CI95[,2]-aa$CI95[,1]
        v<-v/max(v)
    } else v<-rep(0.5,tree$Nnode)
    n<-length(obj$cols)-1
    lastPP<-get("last_plot.phylo",envir=.PlotPhyloEnv)
    h<-max(nodeHeights(tree))
    for(i in 1:tree$Nnode){
        ii<-round((aa$CI95[i,1]-obj$lims[1])/d*n)
        jj<-round((aa$CI95[i,2]-obj$lims[1])/d*(n+1))
        cols<-obj$cols[ii:jj]   
        add.color.bar(leg=0.1*h*v[i],cols=cols,prompt=FALSE,
            x=lastPP$xx[i+Ntip(tree)]-0.05*h*v[i],
            y=lastPP$yy[i+Ntip(tree)],title="",subtitle="",lims=NULL,
            lwd=14)
    }
}

One element of the function

x<-setNames(sapply(1:Ntip(obj$tree),function(x,obj){
    ii<-which(obj$tree$edge[,2]==x)
    ss<-names(obj$tree$maps[[ii]][length(obj$tree$maps[[ii]])])
    obj$lims[1]+as.numeric(ss)/(length(obj$cols)-1)*diff(obj$lims)
    },obj=obj),obj$tree$tip.label)

is a complicated piece of code that is only used if the optional argument x (the same as x in contMap is not supplied. What this line does is pull the values of x instead from the mapped object.

OK, let's try it:

library(phytools)
tree
## 
## Phylogenetic tree with 26 tips and 25 internal nodes.
## 
## Tip labels:
##  A, B, C, D, E, F, ...
## 
## Rooted; includes branch lengths.
x
##           A           B           C           D           E           F 
##  2.40577324 -2.09590799 -0.03881523 -0.32857686 -1.71533031 -1.07523748 
##           G           H           I           J           K           L 
## -0.99890853  3.07436198  4.00842163 -0.27299915  2.41135389  2.61494639 
##           M           N           O           P           Q           R 
##  2.97710788  3.56872630  2.96244691  1.92234400  3.85672218  5.70984852 
##           S           T           U           V           W           X 
##  0.15119803  0.50825014  3.26386662  2.21301397  1.52734360  1.93244087 
##           Y           Z 
##  1.51311417  4.41672993
y
##           A           B           C           D           E           F 
## -0.52075890 -3.22896754 -1.99922219 -2.99064026 -3.85137945  0.92981725 
##           G           H           I           J           K           L 
## -1.77583525 -5.53973459 -1.39568877 -6.03400572 -5.50545828 -2.04112670 
##           M           N           O           P           Q           R 
## -3.45201728 -2.87475079 -4.97170923 -1.41198149 -1.58924448  0.07361102 
##           S           T           U           V           W           X 
## -0.37121255  1.66386064  0.38763598 -3.65979548 -2.99468975 -1.57039289 
##           Y           Z 
## -0.27934165  0.97554201

First with x:

obj<-contMap(tree,x,plot=FALSE)
plot(obj,xlim=c(-0.5,10.5))
errorbar.contMap(obj,scale.by.ci=TRUE)

plot of chunk unnamed-chunk-4

Now with y:

obj<-contMap(tree,y,plot=FALSE)
plot(obj,xlim=c(-0.5,10.5))
errorbar.contMap(obj,scale.by.ci=TRUE)

plot of chunk unnamed-chunk-5

##   -----
##   The range of the contMap object, presently (-6.034,1.6639), should be equal to
##   or greater than the range of the CIs on ancestral states: (-5.9555,2.7099).
##   To ensure that your error bars are correctly plotted, please recompute your
##   contMap object and increase lims.
##   -----

This tells us that the range of the CIs on the ancestral nodes for y exceeds the limits of the "contMap" object. This is why some of the plotted CIs (most noticeably, the root) have black regions. This should be easy to fix as follows:

obj<-contMap(tree,y,lims=c(-6,2.8),plot=FALSE)
plot(obj,xlim=c(-0.5,10.5))
errorbar.contMap(obj,scale.by.ci=TRUE)

plot of chunk unnamed-chunk-6

Some of the kinks are still being worked out. Please pardon any bugs!

More on adding error bars to a "contMap" plot

Earlier today I responded to a question about plotting color error bars at the internal nodes of a plotted "contMap" object. The original poster then also added that he would like to scale the length of the error bars by the degree of uncertainty in each estimated node state. This is fairly straightforward. Using the same tree & data we employed earlier, I would go about doing this as follows:

library(phytools)
aa<-fastAnc(tree,x,CI=TRUE)
xlim<-range(aa$CI95)
obj<-contMap(tree,x,lims=xlim,plot=FALSE)
## for fun, let's change the color scheme:
obj<-setMap(obj,colors=c("blue","purple","red"))
plot(obj,xlim=c(-0.12,2.2))
d<-diff(obj$lims)
v<-aa$CI95[,2]-aa$CI95[,1]
v<-v/max(v)
n<-length(obj$cols)-1
lastPP<-get("last_plot.phylo",envir=.PlotPhyloEnv)
h<-max(nodeHeights(tree))
for(i in 1:tree$Nnode){
    ii<-round((aa$CI95[i,1]-obj$lims[1])/d*n)
    jj<-round((aa$CI95[i,2]-obj$lims[1])/d*(n+1))
    cols<-obj$cols[ii:jj]   
    add.color.bar(leg=0.1*h*v[i],cols=cols,prompt=FALSE,
        x=lastPP$xx[i+Ntip(tree)]-0.05*h*v[i],
        y=lastPP$yy[i+Ntip(tree)],title="",subtitle="",lims=NULL,
        lwd=14)
}

plot of chunk unnamed-chunk-1

A second poster asked if it were possible to show multiple CIs in this way on a single tree. Of course - anything is possible, so why not give this a try too?

Here, I'll imagine I have the same tree as before, but three continuous characters in the columns of a data matrix X as follows:

X
##           v1          v2         v3
## A -2.7149209 -1.60824200 -0.1530299
## B -2.6918371 -0.56868787 -0.4045761
## C -2.2214286  0.74691204 -0.7665039
## D -0.7471508 -0.07388221 -1.8163244
## E -0.2516396 -1.59142223 -0.7799235
## F -1.0164357 -0.77471225 -1.2468231
## G -0.2909680 -1.22294794 -1.2467781
## H -0.5319452 -0.10286165 -1.3781947
## I  1.0340376 -0.95934999 -0.9955973
## J  0.4739410 -0.49495658 -1.2955523
## K  1.1069054 -0.49403236 -1.1990746
## L -0.1095284  0.58439209 -1.3297114

First, we need to repeat our analyses of above - but now across all three characters:

AA<-apply(X,2,fastAnc,tree=tree,CI=TRUE)
lims<-range(sapply(AA,function(x) range(x$CI95)))
objs<-apply(X,2,contMap,tree=tree,lims=lims,plot=FALSE)
objs
## $v1
## Object of class "contMap" containing:
## 
## (1) A phylogenetic tree with 12 tips and 11 internal nodes.
## 
## (2) A mapped continuous trait on the range (-3.093201, 1.302283).
## 
## 
## $v2
## Object of class "contMap" containing:
## 
## (1) A phylogenetic tree with 12 tips and 11 internal nodes.
## 
## (2) A mapped continuous trait on the range (-3.093201, 1.302283).
## 
## 
## $v3
## Object of class "contMap" containing:
## 
## (1) A phylogenetic tree with 12 tips and 11 internal nodes.
## 
## (2) A mapped continuous trait on the range (-3.093201, 1.302283).

Let's set our "contMap" objects to grey scale:

objs<-lapply(objs,setMap,colors=c("white","black"))

For reasons that will become clear below, I'll also find the biggest CI on any node state across all three of the characters:

D<-sapply(AA,function(x) x$CI95[,2]-x$CI95[,1])
max.v<-max(D)

Now, since we have three characters, maybe it makes more sense to plot the original tree & then just the CIs at nodes - along with our legend:

plotTree(tree,lwd=3,xlim=c(-0.12,2.16),ylim=c(-0.8,12))
add.color.bar(leg=0.5*max(nodeHeights(tree)),cols=objs[[1]]$cols,title="trait value",
    lims=objs[[1]]$lims,prompt=FALSE,x=0,y=-0.4)
foo<-function(aa,obj,tree,y.shift,max.v){
    d<-diff(obj$lims)
    n<-length(obj$cols)-1
    lastPP<-get("last_plot.phylo",envir=.PlotPhyloEnv)
    h<-max(nodeHeights(tree))
    v<-aa$CI95[,2]-aa$CI95[,1]
    v<-v/max.v
    for(i in 1:tree$Nnode){
        ii<-round((aa$CI95[i,1]-obj$lims[1])/d*n)
        jj<-round((aa$CI95[i,2]-obj$lims[1])/d*(n+1))
        cols<-obj$cols[ii:jj]   
        add.color.bar(leg=0.1*h*v[i],cols=cols,prompt=FALSE,
            x=lastPP$xx[i+Ntip(tree)]-0.05*h*v[i],
            y=lastPP$yy[i+Ntip(tree)]+y.shift,
            title="",subtitle="",lims=NULL,lwd=10)
    }
}
for(i in 1:length(objs)) foo(AA[[i]],objs[[i]],tree,y.shift=(-i+2)/3,max.v=max.v)

plot of chunk unnamed-chunk-6

That's it.

Adding colorful error bars to internal node values using phytools::contMap

Today a R-sig-phylo member posted the following question:

“I was wondering about plotting the output of an ancestral state 'reconstruction' of a continuous trait while incorporating at least some of the uncertainty around the estimates.

One approach I thought of was to map the ASR onto a tree in a standard way, then at each node have essentially a mini-legend that is of a length reflecting the width of the confidence interval of the estimate at that node, and is coloured on the same colour-scale as the overall tree legend. For instance, if the colour scheme for the tree goes from blue through yellow to red as the value increases, then a node with a relatively precise and high estimate will have a short bar only ranging through different shades of red, whereas a highly uncertain low estimate will have a wider bar coloured from (say) dark blue to orange/light red. I hope that description makes sense.”

This is cannot be done automatically in phytools, but it is relatively straightforward to accomplish. For the record, I thought this would look terrible - but it actually looks better than I expected.

One key trick is that if we want our 'error bars' to show the 95% CIs for our ancestral values, we have to first find the total range of all such intervals and then use those to specify the range (argument lims) in our contMap calculation:

## load our package
library(phytools)
## here's our tree & data
tree
## 
## Phylogenetic tree with 12 tips and 11 internal nodes.
## 
## Tip labels:
##  A, B, C, D, E, F, ...
## 
## Rooted; includes branch lengths.
x
##          A          B          C          D          E          F 
##  1.6716660  0.8798575  1.5227959  1.1046503  1.3339632 -0.9191693 
##          G          H          I          J          K          L 
## -0.9052098 -0.4977580 -0.6099311 -1.3487360 -1.6726186 -0.5795939
## get ancestral states with CIs
aa<-fastAnc(tree,x,CI=TRUE)
xlim<-range(aa$CI95) ## our range for the "contMap" object
## compute our "contMap" object
obj<-contMap(tree,x,lims=xlim,plot=FALSE)
obj
## Object of class "contMap" containing:
## 
## (1) A phylogenetic tree with 12 tips and 11 internal nodes.
## 
## (2) A mapped continuous trait on the range (-1.657191, 2.200442).
plot(obj,xlim=c(-0.12,2.2)) ## to accommodate the root node label
d<-diff(obj$lims)
n<-length(obj$cols)-1
lastPP<-get("last_plot.phylo",envir=.PlotPhyloEnv)
h<-max(nodeHeights(tree))
for(i in 1:tree$Nnode){
    ii<-round((aa$CI95[i,1]-obj$lims[1])/d*n)
    jj<-round((aa$CI95[i,2]-obj$lims[1])/d*(n+1))
    cols<-obj$cols[ii:jj]   
    add.color.bar(leg=0.05*h,cols=cols,prompt=FALSE,
        x=lastPP$xx[i+Ntip(tree)]-0.025*h,
        y=lastPP$yy[i+Ntip(tree)],title="",subtitle="",lims=NULL,
        lwd=14)
}

plot of chunk unnamed-chunk-1

That's it. If we want to change our color scheme, that is also no problem:

obj<-setMap(obj,colors=c("black","white"))
plot(obj,xlim=c(-0.12,2.2))
d<-diff(obj$lims)
n<-length(obj$cols)-1
lastPP<-get("last_plot.phylo",envir=.PlotPhyloEnv)
h<-max(nodeHeights(tree))
for(i in 1:tree$Nnode){
    ii<-round((aa$CI95[i,1]-obj$lims[1])/d*n)
    jj<-round((aa$CI95[i,2]-obj$lims[1])/d*(n+1))
    cols<-obj$cols[ii:jj]   
    add.color.bar(leg=0.05*h,cols=cols,prompt=FALSE,
        x=lastPP$xx[i+Ntip(tree)]-0.025*h,
        y=lastPP$yy[i+Ntip(tree)],title="",subtitle="",lims=NULL,
        lwd=14)
}

plot of chunk unnamed-chunk-2

Neat.

Obviously, any of the parameters of the error bars - their length (here set to 5% of the total tree length), width, etc., can easily be changed by modifying the code above.

The tree & data for this exercise were simulated as follows:

tree<-pbtree(n=12,tip.label=LETTERS[1:12])
x<-fastBM(tree)

That's all.