Tuesday, March 26, 2024

"Hacky" trick to show discrete (and continuous!) characters at the tips of a plotted arc or fan style tree using phytools

This weekend I co-taught an NSF-funded workshop here at UMass-Boston with James Boyko and Luke Harmon, the latter of whom joined remotely, but still managed to bring a lot to the course.

A workshop participant asked me about visualizing one or more discrete characters at the tips of an “arc” or “fan” style tree in R.

I feel like I’ve provided other solutions to this problem in the past – but a new “hack” solution occurred to me. Even if there are less hacky options out there (including in phytools), I thought I’d put it here for posterity – just in case it might ever come in handy!

I’m just going to run the demo first and then I’ll explain it. The tree & data for this particular example come from Betancur et al. (2017) and Benun Sutton & Wilson (2019), respectively

library(phytools)
data("bonyfish.tree")
bonyfish.tree
## 
## Phylogenetic tree with 90 tips and 89 internal nodes.
## 
## Tip labels:
##   Xenomystus_nigri, Chirocentrus_dorab, Talismania_bifurcata, Alepocephalus_tenebrosus, Misgurnus_bipartitus, Opsariichthys_bidens, ...
## 
## Rooted; includes branch lengths.
data("bonyfish.data")
head(bonyfish.data)
##                          spawning_mode paternal_care
## Xenomystus_nigri                  pair          male
## Chirocentrus_dorab               group          none
## Talismania_bifurcata             group          none
## Alepocephalus_tenebrosus         group          none
## Misgurnus_bipartitus              pair          none
## Opsariichthys_bidens              pair          none

We should see that our dataset consists of two different discrete characters. They happen to be stored as factors – but even if they weren’t, this code would work with only a few small modifications.

h<-max(nodeHeights(bonyfish.tree))
par(lend=3)
spawning<-bonyfish.tree
spawning$edge.length[
  which(spawning$edge[,2]<=Ntip(spawning))]<-
  spawning$edge.length[
    which(spawning$edge[,2]<=Ntip(spawning))]+0.16*h
levs1<-levels(bonyfish.data$spawning_mode)
spawning<-paintSubTree(spawning,Ntip(spawning)+1,"0")
for(i in 1:nrow(bonyfish.data)){
  tip<-which(bonyfish.tree$tip.label==
      rownames(bonyfish.data)[i])
  spawning<-paintSubTree(spawning,node=tip,
    state=levs1[bonyfish.data[i,1]],
    stem=(0.05*h)/spawning$edge.length[
      which(spawning$edge[,2]==tip)])
}
care<-bonyfish.tree
care$edge.length[which(care$edge[,2]<=Ntip(care))]<-
  care$edge.length[which(care$edge[,2]<=
      Ntip(care))]+0.08*h
levs2<-levels(bonyfish.data$paternal_care)
care<-paintSubTree(care,Ntip(care)+1,"0")
for(i in 1:nrow(bonyfish.data)){
  tip<-which(bonyfish.tree$tip.label==
      rownames(bonyfish.data)[i])
  care<-paintSubTree(care,node=tip,
    state=levs2[bonyfish.data[i,2]],
    stem=(0.05*h)/care$edge.length[
      which(care$edge[,2]==tip)])
}
cols1<-setNames(c("transparent",palette()[2:3]),
  c("0",levs1))
plot(spawning,cols1,type="arc",arc_height=0.5,
  ftype="i",lwd=8,fsize=0.6,offset=4)
pp<-get("last_plot.phylo",envir=.PlotPhyloEnv)
cols2<-setNames(c("transparent",palette()[4:5]),
  c("0",levs2))
plot(care,cols2,type="arc",arc_height=0.5,
  ftype="off",lwd=8,xlim=pp$x.lim,ylim=pp$y.lim,
  add=TRUE)
plotTree(bonyfish.tree,type="arc",ftype="off",
  arc_height=0.5*1.05,add=TRUE,xlim=pp$x.lim,
  ylim=pp$y.lim,add=TRUE,lwd=1)
legend("topleft",levs1,lwd=8,col=cols1[2:3],
  title="spawning mode",bty="n")
legend("topright",levs2,lwd=8,col=cols2[2:3],
  title="parental care",bty="n")

plot of chunk unnamed-chunk-3

OK, I said I’d explain this. Let me try.

From the graphic it may look like I plotted the tree once – but I actually plotted it three times (thrice?): once for each character level (spawning mode & parental care), and once for the phylogeny itself. Before I did, however, I first extended each of the former trees terminal edges and then re-colored them by the tip state. I set the base color elsewhere to be transparent. Finally I graphed first the spawning mode tree (my outermost character in the plot), then the parental care tree, then the “tree” tree.

Why go to all this trouble? Well, is it really extra trouble compared to figuring out the geometry of lines that we want to emerge from the tips? Maybe not.

Let’s try another example – this time based on data from Esquerre et al. (2019). In this case, I’m going to visualize one discrete trait (parity mode) and one continuous trait (environmental temperature), the latter using a color gradient.

data("liolaemid.tree")
data("liolaemid.data")
head(liolaemid.data)
##                         parity_mode max_altitude temperature
## Ctenoblepharys_adspersa           O          750       23.05
## Liolaemus_abaucan                 O         2600       20.20
## Liolaemus_albiceps                V         4020       12.38
## Liolaemus_andinus                 V         4900       11.40
## Liolaemus_annectens               V         4688        5.10
## Liolaemus_anomalus                O         1400       23.78
liolaemid.data<-liolaemid.data[liolaemid.tree$tip.label,]
h<-max(nodeHeights(liolaemid.tree))
par(lend=3)
parity<-liolaemid.tree
parity$edge.length[which(parity$edge[,2]<=
    Ntip(parity))]<-parity$edge.length[
      which(parity$edge[,2]<=Ntip(parity))]+0.16*h
levs1<-levels(liolaemid.data$parity_mode)
parity<-paintSubTree(parity,Ntip(parity)+1,"0")
for(i in 1:nrow(liolaemid.data)){
  tip<-which(parity$tip.label==
      rownames(liolaemid.data)[i])
  parity<-paintSubTree(parity,node=tip,
    state=levs1[liolaemid.data[i,1]],
    stem=(0.05*h)/parity$edge.length[
      which(parity$edge[,2]==tip)])
}
cols1<-setNames(c("transparent","#F0EAD6","#DF536B"),
  c("0",levs1))
plot(parity,cols1,type="arc",arc_height=1.0,
  ftype="i",lwd=9,fsize=0.5,offset=4,part=1)
pp<-get("last_plot.phylo",envir=.PlotPhyloEnv)
temp_range<-range(liolaemid.data$temperature)
levs2<-setNames(seq(temp_range[1],temp_range[2],
  length.out=100),1:100)
cols2<-setNames(c("transparent",
  colorRampPalette(c("blue","white","red"))(100)),
  0:100)
temperature<-liolaemid.tree
temperature$edge.length[which(temperature$edge[,2]<=
    Ntip(temperature))]<-
  temperature$edge.length[which(temperature$edge[,2]<=
      Ntip(temperature))]+0.08*h
temperature<-paintSubTree(temperature,
  Ntip(temperature)+1,"0")
for(i in 1:nrow(liolaemid.data)){
  tip<-which(temperature$tip.label==
      rownames(liolaemid.data)[i])
  temperature<-paintSubTree(temperature,node=tip,
    state=which.min((liolaemid.data[i,"temperature"]-
        levs2)^2),
    stem=(0.05*h)/temperature$edge.length[
      which(temperature$edge[,2]==tip)])
}
plot(temperature,cols2,type="arc",arc_height=1.0,
  ftype="off",lwd=9,xlim=pp$x.lim,ylim=pp$y.lim,
  add=TRUE,part=1)
plotTree(liolaemid.tree,type="arc",ftype="off",
  arc_height=1.0*1.05,add=TRUE,xlim=pp$x.lim,
  ylim=pp$y.lim,add=TRUE,lwd=1,part=1)
legend(x=0,y=0.5*h,levs1,lwd=8,col=cols1[2:3],
  title="parity mode",bty="n",xjust=0.5,yjust=0.5)
add.color.bar(2*h,cols2,title="temperature (deg. C)",
  lims=temp_range,digits=2,prompt=FALSE,x=-h,y=-0.25*h,
  subtitle="")

plot of chunk unnamed-chunk-5

OK, this is extra cool & especially because it so clearly shows a relationship between viviparity and cooler environmental temperatures – just as we might predict based on natural history.

1 comment:

  1. Hi
    First, thank you for this. I have been looking for a way to display some discrete information in on a phylogenetic tree I am working. I have followed the code, changed some aspects but I keep getting the error below:
    Error in if (stem == 0 && node <= length(tree$tip)) stop("stem must be TRUE for node<=N") :
    missing value where TRUE/FALSE needed.

    What could be the problem or what am I not doing right?

    ReplyDelete

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.