Monday, July 25, 2022

Graphing the results of stochastic mapping with >500 taxa

Earlier today, I got the following question from a phytools user:

“I have been using phytools to create stochastic character maps characterising […]. I am having a little trouble making the resulting plots readable as I have nearly 500 tips on my tree. I was wondering if you could advise me on the best route to making a legible tree. I was hoping to make the tree into a fan and remove the tip labels but have had no success so far.”

Although it is true that meaningful visualization becomes more and more difficult for larger trees, I have found that stochastic maps are relatively straightforward to plot for up to 500 tips or more, particularly if we're not concerned with showing the tip labels.

Note that we can include information about the states at the tips of the tree (for example, as shown here), or clade labels (using phytools::arc.cladelabels, for example, as shown here).

In the following code chunks, I'll demonstrate stochastic mapping on a phylogeny containing >600 taxa using a phylogeny & dataset from Ramm et al. (2019), that also features in an exercise of my upcoming book with Luke Harmon.

## first load packages
library(phytools)
library(geiger)
## now read data from file
lizard.tree<-read.nexus(file="http://www.phytools.org/Rbook/7/lizard_tree.nex")
lizard.data<-read.csv(file="http://www.phytools.org/Rbook/7/lizard_spines.csv",
    row.names=1,stringsAsFactors=TRUE)
## plot full tree
plotTree(lizard.tree,type="fan",lwd=1,ftype="off")

plot of chunk unnamed-chunk-1

## run name.check and prune tree to match input data.
chk<-name.check(lizard.tree,lizard.data)
summary(chk)
## 3504 taxa are present in the tree but not the data:
##     Ablepharus_chernovi,
##     Ablepharus_kitaibelii,
##     Ablepharus_pannonicus,
##     Abronia_aurita,
##     Abronia_campbelli,
##     Abronia_chiszari,
##     ....
## 
## To see complete list of mis-matched taxa, print object.
lizard.tree<-drop.tip(lizard.tree,chk$tree_not_data)
## pull out the morphological trait "tail spines"
tail.spines<-setNames(lizard.data$tail.spines,
    rownames(lizard.data))

As shown in a prior post, it's pretty straightforward (and kind of fun) to visualize a discrete trait like this one at the tips of a fan-style tree. Let's do it.

plotTree(lizard.tree,type="fan",lwd=1,ftype="off")
obj<-get("last_plot.phylo",envir=.PlotPhyloEnv)
n<-Ntip(lizard.tree)
col.morph<-setNames(palette()[c(4,2)],levels(tail.spines))
par(lend=3)
for(i in 1:Ntip(lizard.tree)){
    cc<-if(obj$xx[i]>0) 5 else -5
    th<-atan(obj$yy[i]/obj$xx[i])
    segments(obj$xx[i],obj$yy[i],
        obj$xx[i]+cc*cos(th),
        obj$yy[i]+cc*sin(th),lwd=4,
        col=col.morph[tail.spines[lizard.tree$tip.label[i]]])
}
legend("topleft",legend=levels(tail.spines),
    pch=15,col=col.morph,
    pt.cex=1.5,cex=0.8,bty="n")

plot of chunk unnamed-chunk-2

Alright. Now we should be ready to do stochastic character mapping. To do this, I'll use the popular phytools function make.simmap. Note that for very large trees, and (especially) lots of stochastic maps, make.simmap can be slow. I've previously posted about how to parallelize stochastic mapping using the R package snow. For now, we're not going to worry about that, but we'll only generate 100 stochastic maps.

spine.trees<-make.simmap(lizard.tree,tail.spines,model="ARD",
    pi="fitzjohn",nsim=100)
## make.simmap is sampling character histories conditioned on
## the transition matrix
## 
## Q =
##              non-spiny        spiny
## non-spiny -0.001235890  0.001235890
## spiny      0.003977541 -0.003977541
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
##  non-spiny      spiny 
## 0.98834922 0.01165078
## Done.
spine.trees
## 100 phylogenetic trees with mapped discrete characters

Since the email was about visualizing stochastic mapping (not, for example, a summary of stochastic character histories, I'll first just demonstrate this, including showing the observed states at the tips of the trees as in the previous plot.

plot(spine.trees[[1]],col.morph,type="fan",ftype="off",
    lwd=1)
obj<-get("last_plot.phylo",envir=.PlotPhyloEnv)
n<-Ntip(lizard.tree)
col.morph<-setNames(palette()[c(4,2)],levels(tail.spines))
par(lend=3)
for(i in 1:Ntip(lizard.tree)){
    cc<-if(obj$xx[i]>0) 5 else -5
    th<-atan(obj$yy[i]/obj$xx[i])
    segments(obj$xx[i],obj$yy[i],
        obj$xx[i]+cc*cos(th),
        obj$yy[i]+cc*sin(th),lwd=4,
        col=col.morph[tail.spines[lizard.tree$tip.label[i]]])
}
legend("topleft",legend=levels(tail.spines),
    pch=15,col=col.morph,
    pt.cex=1.5,cex=0.8,bty="n")

plot of chunk unnamed-chunk-4

In addition to just visualizing a single tree, we can also show the posterior probability distribution across our full set of stochastic maps – and this also scales perfectly well to 600 taxa or more. To do that I'll use densityMap and then I'll adjust the color scheme to match our previous plot.

spine.dMap<-densityMap(spine.trees,plot=FALSE)
## sorry - this might take a while; please be patient
spine.dMap<-setMap(spine.dMap,col.morph)
plot(spine.dMap,type="fan",ftype="off",lwd=2,legend=FALSE,
    ylim=max(nodeHeights(lizard.tree))*c(-1,1.1))
add.color.bar(max(nodeHeights(lizard.tree)),spine.dMap$cols,
    title="PP(state = \"spiny\")",
    prompt=FALSE,lwd=6,outline=FALSE,x=0.95*par()$usr[1],
    y=0.9*par()$usr[4],subtitle="",fsize=0.8)

plot of chunk unnamed-chunk-5

Lastly, I'm going to graph our full tree, in a densityMap style, but then also add all of posterior probabilities at nodes, sizing the point size depending on the uncertainty at the node (such that more uncertain nodes with larger radius pies), as demonstrated in a previous post here. In this case, I'll plot any node with a maximum posterior probability ≤ 0.95 3 × larger than any node with a maximum posterior probability > 0.95.

One cool thing I'm going to do is sort the nodes from most to least certain as I plot them. That why the most ambiguous pie charts will not be obscured by less ambiguous ones on top!

pp<-summary(spine.trees)$ace[1:lizard.tree$Nnode,]
plot(spine.dMap,type="fan",ftype="off",lwd=1,legend=FALSE,
    ylim=max(nodeHeights(lizard.tree))*c(-1,1.1))
obj<-get("last_plot.phylo",envir=.PlotPhyloEnv)
n<-Ntip(lizard.tree)
col.morph<-setNames(palette()[c(4,2)],levels(tail.spines))
par(lend=3)
for(i in 1:Ntip(lizard.tree)){
    cc<-if(obj$xx[i]>0) 5 else -5
    th<-atan(obj$yy[i]/obj$xx[i])
    segments(obj$xx[i],obj$yy[i],
        obj$xx[i]+cc*cos(th),
        obj$yy[i]+cc*sin(th),lwd=4,
        col=col.morph[tail.spines[lizard.tree$tip.label[i]]])
}
add.color.bar(max(nodeHeights(lizard.tree)),spine.dMap$cols,
    title="PP(state = \"spiny\")",
    prompt=FALSE,lwd=6,outline=FALSE,x=0.95*par()$usr[1],
    y=0.9*par()$usr[4],subtitle="",fsize=0.8)
h<-max(nodeHeights(lizard.tree))
ii<-order(rowSums(pp^2),decreasing=TRUE)
for(i in ii){
    plotrix::floating.pie(obj$xx[i+Ntip(lizard.tree)],
        obj$yy[i+Ntip(lizard.tree)],
        radius=if(max(pp[i,])<=0.95) 0.03*h else 0.01*h,
        x=pp[i,],col=col.morph,border="transparent")
}
legend(x=0.98*par()$usr[1],y=0.9*par()$usr[4],
    levels(tail.spines),pch=16,col=col.morph,
    pt.cex=2,bty="n",cex=0.8)

plot of chunk unnamed-chunk-6

That's not too bad!

3 comments:

  1. Hi Liam,

    I am having a problem to run the densityMap() funciton.

    I have a MultiSimmap file with 100 elements, and when I try to run the following error message returns:

    Error in mapped.edge[i, names(maps[[i]])[j]] : subscript out of bounds

    Do you know how can I solve this?

    Thanks in advance!

    ReplyDelete
    Replies
    1. Hi Tales and Liam,

      I have the exact same issue! and my MultiSimmap file has 1000 elements. I would truly appreciate your help with this since I desperately need that result for my thesis.

      Thanks in advance!

      Cheers

      Delete
    2. The only suggestion I have is to adjust the optional argument 'res.' You can increase it, say, to 'res = 200' and see if it makes a different. Otherwise, you can try emailing me a saved workspace & I'll try to look into it for you

      Delete

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.