Friday, August 11, 2023

Coloring nodes by bootstrap or Bayesian posterior support values

An R-sig-phylo subscriber recently asked if it was straightforward to add colored points to a plotted tree based on Bayesian posterior probabilities or bootstrap values.

The answer, of course, is yes. One can do this using the ape function nodelabels; however, as Luke Harmon & I show in Chapter 13 of our recently published book, it’s pretty straightforward to script it for ourselves.

Let’s try.

For this example, I’m going to begin by inferring a tree using a dataset ("Laurasiatherian"), that comes packaged with the phangorn package.

library(phytools)
library(phangorn)
data("Laurasiatherian")
Laurasiatherian
## 47 sequences with 3179 character and 1605 different site patterns.
## The states are a c g t

Now as I do this using phangorn::pml_bb, approximate bootstrap values are stored as node labels in my object. (The argument, control=pml.control(trace=0) just turns off the print-out, which can be quite extensive otherwise! In your interactive R session I recommend leaving that off so that you can see the progress of your optimization in real time.)

mammal.pml<-pml_bb(Laurasiatherian,model="GTR+G(4)",
  control=pml.control(trace=0))
mammal.pml
## model: GTR+G(4) 
## loglikelihood: -44699.66 
## unconstrained loglikelihood: -17300.92 
## Model of rate heterogeneity: Discrete gamma model
## Number of rate categories: 4 
## Shape parameter: 0.3515 
##         Rate Proportion
## 1 0.01038367       0.25
## 2 0.14472115       0.25
## 3 0.67297557       0.25
## 4 3.17191961       0.25
## 
## Rate matrix:
##           a          c          g         t
## a  0.000000  3.5989724 13.7579458  3.801364
## c  3.598972  0.0000000  0.4701416 25.006655
## g 13.757946  0.4701416  0.0000000  1.000000
## t  3.801364 25.0066550  1.0000000  0.000000
## 
## Base frequencies:  
##         a         c         g         t 
## 0.3321866 0.1990791 0.2040652 0.2646691

Next, I’m going to pull the phylogeny (alone) off my optimized "pml" object and re-root it, using "Platypus" as my outgroup as follows. Since my node labels contain bootstrap support values, it’s very important that I indicate edgelabel=TRUE in my re-rooting call.

mammal_tree<-root(mammal.pml$tree,"Platypus",resolve.root=TRUE,
  edgelabel=TRUE)
mammal_tree
## 
## Phylogenetic tree with 47 tips and 46 internal nodes.
## 
## Tip labels:
##   Platypus, Wallaroo, Possum, Bandicoot, Opposum, Armadillo, ...
## Node labels:
##   Root, 0.901, 1, 1, 0.562, 1, ...
## 
## Rooted; includes branch lengths.

Now for our support node value custom function. Here, the key piece of code will be get("last_plot.phylo",envir=.PlotPhyloEnv) which gives us an object containing all the node coordinates of our most recent plotted tree. With this in hand it’s a very straightforward thing to add points of different colors to the nodes – depending on the bootstrap support of each clade in the reconstructed phylogeny.

add_support_labels<-function(node=NULL,support,
  cols=c("white","black"),cex=1.5){
  scale<-c(min(support,na.rm=TRUE),1)
  support<-(support-scale[1])/diff(scale)
  pp<-get("last_plot.phylo",envir=.PlotPhyloEnv)
  if(is.null(node)) node<-1:pp$Nnode+pp$Ntip
  colfunc<-colorRamp(cols)
  node_cols<-colfunc(support)/255
  x<-pp$xx[node] 
  y<-pp$yy[node]
  for(i in 1:nrow(node_cols)){
    if(!is.na(support[i]))
      points(x[i],y[i],pch=21,,cex=cex,
        bg=rgb(node_cols[i,1],node_cols[i,2],node_cols[i,3]))
  }
  invisible(scale)
}

OK, so let’s try it! We have to be careful here, though. The way we wrote our function it expects numeric support values for nodes – but the way node labels are stored is as text. This is easy enough to fix using as.numeric as follows.

bs<-as.numeric(mammal_tree$node.label)
## Warning: NAs introduced by coercion
bs
##  [1]    NA 0.901 1.000 1.000 0.562 1.000 0.973 0.344 0.773 1.000 1.000 1.000 0.671 1.000
## [15] 0.739 0.989 0.987 0.999 0.994 0.999 0.627 1.000 0.997 0.758 0.704 1.000 0.976 1.000
## [29] 1.000 0.998 1.000 1.000 1.000 1.000 0.948 0.991 0.762 1.000 1.000 1.000 0.861 0.903
## [43] 1.000 1.000 1.000    NA

Finally, our plot!

plotTree(ladderize(mammal_tree),fsize=0.7)
minmax<-add_support_labels(support=bs)
add.color.bar(leg=0.3*max(nodeHeights(mammal_tree)),
  cols=colorRampPalette(c("white","black"))(100),
  lims=minmax*100,title="bootstrap %",
  subtitle="",prompt=FALSE,x=0.01*par()$usr[2],
  y=0.02*par()$usr[4])

plot of chunk unnamed-chunk-10

Now, in the actual user question, the list subscriber wanted to color node labels discretely – such that support values > 0.95 were colored black, those < 0.95 and > 0.75 a different color (say), and so on.

Without modification, we can do that using our function of above – just by updating the input. Let’s see.

bs[bs<0.75]<-0
bs[intersect(which(bs>=0.75),which(bs<0.95))]<-0.5
bs[bs>=0.95]<-1
plotTree(ladderize(mammal_tree),fsize=0.7)
add_support_labels(support=bs)
legend("bottomleft",c("> 0.95","< 0.95 & > 0.75","< 0.75"),
  pch=21,pt.bg=c("black",rgb(0.5,0.5,0.5),"white"),
  bty="n",pt.cex=1.5,cex=0.8)

plot of chunk unnamed-chunk-11

OK. That’s it for now!