Tuesday, July 26, 2022

Running make.simmap in parallel on large trees using the snow package

A friend & colleague asked on Twitter about how stochastic character mapping using phytools::make.simmap scales with larger trees.

Nearly five years ago, I posted about parallelizing make.simmap using the R package snow. I thought it might be a good moment to revisit that post.

For this example, I'll use the same phylogeny & dataset from Ramm et al. (2019) that I used on my blog of yesterday & that also features in an exercise of my upcoming book with Luke Harmon.

Let's load our packages.

library(phytools)
library(geiger)
library(snow)

In this example, I'm going to use the latest phytools version which contains one small fix to allow an object of class "Qmatrix" to be passed as a fixed value of Q to make.simmap. (Otherwise we need to pass a simple matrix, which can be obtained from an object of class "Qmatrix" using object<-unclass(object).)

packageVersion("phytools")
## [1] '1.1.3'

Next, we can read our tree and data from file. This time, let's analyze the trait "habitat" instead of "tail.spines".

## now read data from file
lizard.tree<-read.nexus(file="http://www.phytools.org/Rbook/7/lizard_tree.nex")
lizard.data<-read.csv(file="http://www.phytools.org/Rbook/7/lizard_spines.csv",
    row.names=1,stringsAsFactors=TRUE)
## run name.check and prune tree to match input data.
chk<-name.check(lizard.tree,lizard.data)
summary(chk)
## 3504 taxa are present in the tree but not the data:
##     Ablepharus_chernovi,
##     Ablepharus_kitaibelii,
##     Ablepharus_pannonicus,
##     Abronia_aurita,
##     Abronia_campbelli,
##     Abronia_chiszari,
##     ....
## 
## To see complete list of mis-matched taxa, print object.
lizard.tree<-drop.tip(lizard.tree,chk$tree_not_data)
## pull out the environmental trait "habitat"
habitat<-setNames(lizard.data$habitat,
    rownames(lizard.data))

For fun, let's plot our data in a fan-style, just as we did yesterday.

plotTree(lizard.tree,type="fan",lwd=1,ftype="off")
obj<-get("last_plot.phylo",envir=.PlotPhyloEnv)
n<-Ntip(lizard.tree)
col.hab<-setNames(c("darkgreen","grey"),levels(habitat))
par(lend=3)
for(i in 1:Ntip(lizard.tree)){
    cc<-if(obj$xx[i]>0) 5 else -5
    th<-atan(obj$yy[i]/obj$xx[i])
    segments(obj$xx[i],obj$yy[i],
        obj$xx[i]+cc*cos(th),
        obj$yy[i]+cc*sin(th),lwd=4,
        col=col.hab[habitat[lizard.tree$tip.label[i]]])
}
legend("topleft",legend=levels(habitat),
    pch=15,col=col.hab,
    pt.cex=1.5,cex=0.8,bty="n")