## Friday, August 11, 2023

### Coloring nodes by bootstrap or Bayesian posterior support values

An R-sig-phylo subscriber recently asked if it was straightforward to add colored points to a plotted tree based on Bayesian posterior probabilities or bootstrap values.

The answer, of course, is yes. One can do this using the ape function `nodelabels`; however, as Luke Harmon & I show in Chapter 13 of our recently published book, it’s pretty straightforward to script it for ourselves.

Let’s try.

For this example, I’m going to begin by inferring a tree using a dataset (`"Laurasiatherian"`), that comes packaged with the phangorn package.

``````library(phytools)
library(phangorn)
data("Laurasiatherian")
Laurasiatherian
``````
``````## 47 sequences with 3179 character and 1605 different site patterns.
## The states are a c g t
``````

Now as I do this using `phangorn::pml_bb`, approximate bootstrap values are stored as node labels in my object. (The argument, `control=pml.control(trace=0)` just turns off the print-out, which can be quite extensive otherwise! In your interactive R session I recommend leaving that off so that you can see the progress of your optimization in real time.)

``````mammal.pml<-pml_bb(Laurasiatherian,model="GTR+G(4)",
control=pml.control(trace=0))
mammal.pml
``````
``````## model: GTR+G(4)
## loglikelihood: -44699.66
## unconstrained loglikelihood: -17300.92
## Model of rate heterogeneity: Discrete gamma model
## Number of rate categories: 4
## Shape parameter: 0.3515
##         Rate Proportion
## 1 0.01038367       0.25
## 2 0.14472115       0.25
## 3 0.67297557       0.25
## 4 3.17191961       0.25
##
## Rate matrix:
##           a          c          g         t
## a  0.000000  3.5989724 13.7579458  3.801364
## c  3.598972  0.0000000  0.4701416 25.006655
## g 13.757946  0.4701416  0.0000000  1.000000
## t  3.801364 25.0066550  1.0000000  0.000000
##
## Base frequencies:
##         a         c         g         t
## 0.3321866 0.1990791 0.2040652 0.2646691
``````

Next, I’m going to pull the phylogeny (alone) off my optimized `"pml"` object and re-root it, using `"Platypus"` as my outgroup as follows. Since my node labels contain bootstrap support values, it’s very important that I indicate `edgelabel=TRUE` in my re-rooting call.

``````mammal_tree<-root(mammal.pml\$tree,"Platypus",resolve.root=TRUE,
edgelabel=TRUE)
mammal_tree
``````
``````##
## Phylogenetic tree with 47 tips and 46 internal nodes.
##
## Tip labels:
##   Platypus, Wallaroo, Possum, Bandicoot, Opposum, Armadillo, ...
## Node labels:
##   Root, 0.901, 1, 1, 0.562, 1, ...
##
## Rooted; includes branch lengths.
``````

Now for our support node value custom function. Here, the key piece of code will be `get("last_plot.phylo",envir=.PlotPhyloEnv)` which gives us an object containing all the node coordinates of our most recent plotted tree. With this in hand it’s a very straightforward thing to add points of different colors to the nodes – depending on the bootstrap support of each clade in the reconstructed phylogeny.

``````add_support_labels<-function(node=NULL,support,
cols=c("white","black"),cex=1.5){
scale<-c(min(support,na.rm=TRUE),1)
support<-(support-scale[1])/diff(scale)
pp<-get("last_plot.phylo",envir=.PlotPhyloEnv)
if(is.null(node)) node<-1:pp\$Nnode+pp\$Ntip
colfunc<-colorRamp(cols)
node_cols<-colfunc(support)/255
x<-pp\$xx[node]
y<-pp\$yy[node]
for(i in 1:nrow(node_cols)){
if(!is.na(support[i]))
points(x[i],y[i],pch=21,,cex=cex,
bg=rgb(node_cols[i,1],node_cols[i,2],node_cols[i,3]))
}
invisible(scale)
}
``````

OK, so let’s try it! We have to be careful here, though. The way we wrote our function it expects numeric support values for nodes – but the way node labels are stored is as text. This is easy enough to fix using `as.numeric` as follows.

``````bs<-as.numeric(mammal_tree\$node.label)
``````
``````## Warning: NAs introduced by coercion
``````
``````bs
``````
``````##  [1]    NA 0.901 1.000 1.000 0.562 1.000 0.973 0.344 0.773 1.000 1.000 1.000 0.671 1.000
## [15] 0.739 0.989 0.987 0.999 0.994 0.999 0.627 1.000 0.997 0.758 0.704 1.000 0.976 1.000
## [29] 1.000 0.998 1.000 1.000 1.000 1.000 0.948 0.991 0.762 1.000 1.000 1.000 0.861 0.903
## [43] 1.000 1.000 1.000    NA
``````

Finally, our plot!

``````plotTree(ladderize(mammal_tree),fsize=0.7)
cols=colorRampPalette(c("white","black"))(100),
lims=minmax*100,title="bootstrap %",
subtitle="",prompt=FALSE,x=0.01*par()\$usr[2],
y=0.02*par()\$usr[4])
``````

Now, in the actual user question, the list subscriber wanted to color node labels discretely – such that support values > 0.95 were colored black, those < 0.95 and > 0.75 a different color (say), and so on.

Without modification, we can do that using our function of above – just by updating the input. Let’s see.

``````bs[bs<0.75]<-0
bs[intersect(which(bs>=0.75),which(bs<0.95))]<-0.5
bs[bs>=0.95]<-1