Saturday, September 18, 2021

A hacky method to plot the states of one or more discrete character traits at the tips of a fan-style tree

Working on the practice problems for my upcoming R phylogenetics book with Luke Harmon, I just wrote a simple hack to easily show the states of one or more discrete characters at the tips of a fan-style phylogenetic tree in R. This can be done without any additional R packages other than ape or phytools. Here's how it works.

For the following example, I'm going to use a dataset that I subsampled from this cool study by Till Ramm & colleagues.

First, I'll load packages & then read my dataset & tree from file.

library(phytools)
library(geiger)
lizard.tree<-read.nexus("lizard_tree.nex")
lizard.data<-read.csv("lizard_spines.csv",row.names=1,
    stringsAsFactors=TRUE)
head(lizard.data)
##                                   habitat tail.spines
## Ablepharus_budaki          non-saxicolous   non-spiny
## Abronia_anzuetoi           non-saxicolous   non-spiny
## Abronia_frosti             non-saxicolous   non-spiny
## Acanthodactylus_aureus     non-saxicolous   non-spiny
## Acanthodactylus_opheodurus non-saxicolous   non-spiny
## Acanthodactylus_pardalis   non-saxicolous   non-spiny
chk<-name.check(lizard.tree,lizard.data)
summary(chk)
## 3504 taxa are present in the tree but not the data:
##     Ablepharus_chernovi,
##     Ablepharus_kitaibelii,
##     Ablepharus_pannonicus,
##     Abronia_aurita,
##     Abronia_campbelli,
##     Abronia_chiszari,
##     ....
## 
## To see complete list of mis-matched taxa, print object.
lizard.tree<-drop.tip(lizard.tree,chk$tree_not_data)

Next, I'll extract our habitat and tail spine character as vectors:

habitat<-setNames(lizard.data$habitat,
    rownames(lizard.data))
tail.spines<-setNames(lizard.data$tail.spines,
    rownames(lizard.data))

Finally, I'll create the visualization. Here what I'm going to do is just plot lines that run from the tip of the tree out. If I make them thick enough, there should be no visible space between each line and the graph will (hopefully) nicely illustrate the distribution of our traits across the tips of the phylogeny.

plotTree(lizard.tree,type="fan",lwd=1,ftype="off")
obj<-get("last_plot.phylo",envir=.PlotPhyloEnv)
n<-Ntip(lizard.tree)
col.hab<-setNames(c("darkgreen","grey"),levels(habitat))
col.morph<-setNames(palette()[c(4,2)],levels(tail.spines))
par(lend=3)
for(i in 1:Ntip(lizard.tree)){
    cc<-if(obj$xx[i]>0) 5 else -5
    th<-atan(obj$yy[i]/obj$xx[i])
    segments(obj$xx[i],obj$yy[i],
        obj$xx[i]+cc*cos(th),
        obj$yy[i]+cc*sin(th),
        lwd=4,
        col=col.hab[habitat[lizard.tree$tip.label[i]]])
    segments(obj$xx[i]+cc*cos(th),
        obj$yy[i]+cc*sin(th),
        obj$xx[i]+2*cc*cos(th),
        obj$yy[i]+2*cc*sin(th),lwd=4,
        col=col.morph[tail.spines[lizard.tree$tip.label[i]]])
}
legend("topleft",c(levels(habitat),levels(tail.spines)),
    pch=15,col=c(col.hab,col.morph),
    pt.cex=1.5,cex=0.8,bty="n")

plot of chunk unnamed-chunk-3

All I needed to get the orientation of each line was a little trig: I got the angle by computing the arctangent of the y over the x position of the tip node, then I calculated the vertical and horizontal displacements of my segment by calculating the sine and cosine of that angle, respectively.

Just to see how it scales, why don't we take our tree (and an estimate of our matrix for one of our traits – let's say, habitat) and then simulate 5 new traits?

Q<-as.Qmatrix(fitMk(lizard.tree,habitat,model="ER"))
Q
## Estimated Q matrix:
##                non-saxicolous   saxicolous
## non-saxicolous   -0.004410229  0.004410229
## saxicolous        0.004410229 -0.004410229
Q<-unclass(Q)
rownames(Q)<-colnames(Q)<-0:1
X<-sim.Mk(lizard.tree,Q,nsim=5)
head(X)
##                 V1 V2 V3 V4 V5
## Dibamus_greeri   1  0  1  1  1
## Orraya_occultus  0  1  1  1  1
## Nephrurus_amyae  1  0  1  1  1
## Nephrurus_levis  1  0  1  1  1
## Lialis_jicari    1  1  1  0  1
## Aprasia_repens   1  1  1  0  1

Now, let's create our new graph. Here, I'll pretend that 0 vs. 1 is a presence/absence feature & represent the absence with a color that matches our background. Here I'll set lwd=2 which leaves tiny spaces between each symbol – if anything, I like that a little bit better.

par(fg="white",bg="black")
plotTree(lizard.tree,type="fan",lwd=1,ftype="off",
    color="white",xlim=c(-190,190))
obj<-get("last_plot.phylo",envir=.PlotPhyloEnv)
n<-Ntip(lizard.tree)
cols<-setNames(c("black",palette()[4]),0:1)
par(lend=3)
for(i in 1:Ntip(lizard.tree)){
    cc<-if(obj$xx[i]>0) 5 else -5
    th<-atan(obj$yy[i]/obj$xx[i])
    xx<-obj$xx[i]
    yy<-obj$yy[i]
    for(j in 1:ncol(X)){
        segments(xx,yy,
            xx<-xx+cc*cos(th),
            yy<-yy+cc*sin(th),lwd=2,
            col=cols[X[lizard.tree$tip.label[i],j]])
    }
}

plot of chunk unnamed-chunk-5