Friday, November 30, 2018

Plotting confidence intervals on internal nodes in a traitgram

The popular visualization called a traitgram (e.g., Evans et al. 2009) is a a projection of the tree into a space that is defined by time since the root on one axis and the observed or reconstructed value of a phenotypic trait on the other. For example, here is a traitgram created using simulated data:

library(phytools)
tree
## 
## Phylogenetic tree with 16 tips and 15 internal nodes.
## 
## Tip labels:
##  A, B, C, D, E, F, ...
## 
## Rooted; includes branch lengths.
x
##          A          B          C          D          E          F 
##  2.0275876  3.3977458  2.2337622 -3.7190991 -6.8014316 -6.6150166 
##          G          H          I          J          K          L 
## -8.6625154 -5.0548866 -7.9049386 -7.2639810 -1.8821460 -0.8618768 
##          M          N          O          P 
## -3.0035373 -6.0933141 -4.9157566  2.6977773
phenogram(tree,x,color="darkgrey",lwd=3,ftype="i")

plot of chunk unnamed-chunk-1

In my opinion, it is completely fine to treat the vertical positions of the internal nodes of this tree merely as convenient positions from which to project the tree into this space; however, their values are (normally) the MLEs of ancestral states. Consequently, I earlier developed a method to overlay the inherent uncertainty about these values by way of a color transparency gradient. For example:

fancyTree(tree,type="phenogram95",x=x)
## Computing density traitgram...

plot of chunk unnamed-chunk-2

Alternatively, it occurred to me that it might be equally (if not more) useful to merely overlay the 95% CIs around each internal node as a vertical line. This has never been automated, but can be done easily as follows:

fit<-fastAnc(tree,x,CI=TRUE)
phenogram(tree,x,ylim=range(c(x,fit$CI95)),ftype="i")
h<-sapply(Ntip(tree)+1:tree$Nnode,nodeheight,tree=tree)
for(i in 1:tree$Nnode)
    lines(rep(h[i],2),fit$CI95[i,],lwd=3,
        col=make.transparent("blue",0.15))
points(h,fit$ace,cex=1.5,pch=21,bg="grey")

plot of chunk unnamed-chunk-3

Of course, since these data come from simulation we can actually know the true ancestral state values. Let's overlay them now so we can see how frequently they fall on the 95% CIs:

phenogram(tree,x,ylim=range(c(x,fit$CI95)),ftype="i")
h<-sapply(Ntip(tree)+1:tree$Nnode,nodeheight,tree=tree)
for(i in 1:tree$Nnode)
    lines(rep(h[i],2),fit$CI95[i,],lwd=3,
        col=make.transparent("grey",0.5))
points(h,fit$ace,cex=1.5,pch=21,bg="grey")
points(h,a,cex=1.5,pch=21,bg=make.transparent("blue",0.15),col="grey")
legend(x="bottomleft",c("estimated state","true value"),pt.cex=1.5,
    pt.bg=c("grey",make.transparent("blue",0.15)),pch=21)

plot of chunk unnamed-chunk-4

Finally, let's take a look at how ancestral state estimation improves if we include a known value - say the state at the root which I happen to know is 2:

a[1]
## 17 
##  2
fit<-fastAnc(tree,x,anc.states=a[1],CI=TRUE)
phenogram(tree,c(x,fit$ace),ylim=range(c(x,fit$CI95)),ftype="i")
h<-sapply(Ntip(tree)+1:tree$Nnode,nodeheight,tree=tree)
for(i in 1:tree$Nnode)
    lines(rep(h[i],2),fit$CI95[i,],lwd=3,
        col=make.transparent("grey",0.5))
points(h,fit$ace,cex=1.5,pch=21,bg="grey")
points(h,a,cex=1.5,pch=21,bg=make.transparent("blue",0.15),col="grey")

plot of chunk unnamed-chunk-5

Mostly deep nodes get a lot better, in this case, whereas recent nodes are relatively unaffected.

The data & tree were simulated as follows:

set.seed(9)
tree<-pbtree(n=16,tip.label=LETTERS[1:16],scale=20)
x<-fastBM(tree,a=2,internal=TRUE)
a<-x[Ntip(tree)+1:tree$Nnode]
x<-x[tree$tip.label]