Monday, July 25, 2022

Graphing the results of stochastic mapping with >500 taxa

Earlier today, I got the following question from a phytools user:

“I have been using phytools to create stochastic character maps characterising […]. I am having a little trouble making the resulting plots readable as I have nearly 500 tips on my tree. I was wondering if you could advise me on the best route to making a legible tree. I was hoping to make the tree into a fan and remove the tip labels but have had no success so far.”

Although it is true that meaningful visualization becomes more and more difficult for larger trees, I have found that stochastic maps are relatively straightforward to plot for up to 500 tips or more, particularly if we're not concerned with showing the tip labels.

Note that we can include information about the states at the tips of the tree (for example, as shown here), or clade labels (using phytools::arc.cladelabels, for example, as shown here).

In the following code chunks, I'll demonstrate stochastic mapping on a phylogeny containing >600 taxa using a phylogeny & dataset from Ramm et al. (2019), that also features in an exercise of my upcoming book with Luke Harmon.

## first load packages
library(phytools)
library(geiger)
## now read data from file
lizard.tree<-read.nexus(file="http://www.phytools.org/Rbook/7/lizard_tree.nex")
lizard.data<-read.csv(file="http://www.phytools.org/Rbook/7/lizard_spines.csv",
    row.names=1,stringsAsFactors=TRUE)
## plot full tree
plotTree(lizard.tree,type="fan",lwd=1,ftype="off")

plot of chunk unnamed-chunk-1

## run name.check and prune tree to match input data.
chk<-name.check(lizard.tree,lizard.data)
summary(chk)
## 3504 taxa are present in the tree but not the data:
##     Ablepharus_chernovi,
##     Ablepharus_kitaibelii,
##     Ablepharus_pannonicus,
##     Abronia_aurita,
##     Abronia_campbelli,
##     Abronia_chiszari,
##     ....
## 
## To see complete list of mis-matched taxa, print object.
lizard.tree<-drop.tip(lizard.tree,chk$tree_not_data)
## pull out the morphological trait "tail spines"
tail.spines<-setNames(lizard.data$tail.spines,
    rownames(lizard.data))

As shown in a prior post, it's pretty straightforward (and kind of fun) to visualize a discrete trait like this one at the tips of a fan-style tree. Let's do it.

plotTree(lizard.tree,type="fan",lwd=1,ftype="off")
obj<-get("last_plot.phylo",envir=.PlotPhyloEnv)
n<-Ntip(lizard.tree)
col.morph<-setNames(palette()[c(4,2)],levels(tail.spines))
par(lend=3)
for(i in 1:Ntip(lizard.tree)){
    cc<-if(obj$xx[i]>0) 5 else -5
    th<-atan(obj$yy[i]/obj$xx[i])
    segments(obj$xx[i],obj$yy[i],
        obj$xx[i]+cc*cos(th),
        obj$yy[i]+cc*sin(th),lwd=4,
        col=col.morph[tail.spines[lizard.tree$tip.label[i]]])
}
legend("topleft",legend=levels(tail.spines),
    pch=15,col=col.morph,
    pt.cex=1.5,cex=0.8,bty="n")

plot of chunk unnamed-chunk-2

Alright. Now we should be ready to do stochastic character mapping. To do this, I'll use the popular phytools function make.simmap. Note that for very large trees, and (especially) lots of stochastic maps, make.simmap can be slow. I've previously posted about how to parallelize stochastic mapping using the R package snow. For now, we're not going to worry about that, but we'll only generate 100 stochastic maps.

spine.trees<-make.simmap(lizard.tree,tail.spines,model="ARD",
    pi="fitzjohn",nsim=100)
## make.simmap is sampling character histories conditioned on
## the transition matrix
## 
## Q =
##              non-spiny        spiny
## non-spiny -0.001235890  0.001235890
## spiny      0.003977541 -0.003977541
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
##  non-spiny      spiny 
## 0.98834922 0.01165078
## Done.
spine.trees
## 100 phylogenetic trees with mapped discrete characters

Since the email was about visualizing stochastic mapping (not, for example, a summary of stochastic character histories, I'll first just demonstrate this, including showing the observed states at the tips of the trees as in the previous plot.

plot(spine.trees[[1]],col.morph,type="fan",ftype="off",
    lwd=1)
obj<-get("last_plot.phylo",envir=.PlotPhyloEnv)
n<-Ntip(lizard.tree)
col.morph<-setNames(palette()[c(4,2)],levels(tail.spines))
par(lend=3)
for(i in 1:Ntip(lizard.tree)){
    cc<-if(obj$xx[i]>0) 5 else -5
    th<-atan(obj$yy[i]/obj$xx[i])
    segments(obj$xx[i],obj$yy[i],
        obj$xx[i]+cc*cos(th),
        obj$yy[i]+cc*sin(th),lwd=4,
        col=col.morph[tail.spines[lizard.tree$tip.label[i]]])
}
legend("topleft",legend=levels(tail.spines),
    pch=15,col=col.morph,
    pt.cex=1.5,cex=0.8,bty="n")