Phylogenetic Tools for Comparative Biology: Graphing the results of stochastic mapping with >500 taxa

Earlier today, I got the following question from a phytools user:

“I have been using phytools to create stochastic character maps characterising […]. I am having a little trouble making the resulting plots readable as I have nearly 500 tips on my tree. I was wondering if you could advise me on the best route to making a legible tree. I was hoping to make the tree into a fan and remove the tip labels but have had no success so far.”

Although it is true that meaningful visualization becomes more and more difficult for larger trees, I have found that stochastic maps are relatively straightforward to plot for up to 500 tips or more, particularly if we're not concerned with showing the tip labels.

Note that we can include information about the states at the tips of the tree (for example, as shown here), or clade labels (using phytools::arc.cladelabels, for example, as shown here).

In the following code chunks, I'll demonstrate stochastic mapping on a phylogeny containing >600 taxa using a phylogeny & dataset from Ramm et al. (2019), that also features in an exercise of my upcoming book with Luke Harmon.

## first load packages
library(phytools)
library(geiger)
## now read data from file
lizard.tree<-read.nexus(file="http://www.phytools.org/Rbook/7/lizard_tree.nex")
lizard.data<-read.csv(file="http://www.phytools.org/Rbook/7/lizard_spines.csv",
    row.names=1,stringsAsFactors=TRUE)
## plot full tree
plotTree(lizard.tree,type="fan",lwd=1,ftype="off")

plot of chunk unnamed-chunk-1

## run name.check and prune tree to match input data.
chk<-name.check(lizard.tree,lizard.data)
summary(chk)

## 3504 taxa are present in the tree but not the data:
##     Ablepharus_chernovi,
##     Ablepharus_kitaibelii,
##     Ablepharus_pannonicus,
##     Abronia_aurita,
##     Abronia_campbelli,
##     Abronia_chiszari,
##     ....
## 
## To see complete list of mis-matched taxa, print object.

lizard.tree<-drop.tip(lizard.tree,chk$tree_not_data)
## pull out the morphological trait "tail spines"
tail.spines<-setNames(lizard.data$tail.spines,
    rownames(lizard.data))

As shown in a prior post, it's pretty straightforward (and kind of fun) to visualize a discrete trait like this one at the tips of a fan-style tree. Let's do it.

plotTree(lizard.tree,type="fan",lwd=1,ftype="off")
obj<-get("last_plot.phylo",envir=.PlotPhyloEnv)
n<-Ntip(lizard.tree)
col.morph<-setNames(palette()[c(4,2)],levels(tail.spines))
par(lend=3)
for(i in 1:Ntip(lizard.tree)){
    cc<-if(obj$xx[i]>0) 5 else -5
    th<-atan(obj$yy[i]/obj$xx[i])
    segments(obj$xx[i],obj$yy[i],
        obj$xx[i]+cc*cos(th),
        obj$yy[i]+cc*sin(th),lwd=4,
        col=col.morph[tail.spines[lizard.tree$tip.label[i]]])
}
legend("topleft",legend=levels(tail.spines),
    pch=15,col=col.morph,
    pt.cex=1.5,cex=0.8,bty="n")

plot of chunk unnamed-chunk-2

Alright. Now we should be ready to do stochastic character mapping. To do this, I'll use the popular phytools function make.simmap. Note that for very large trees, and (especially) lots of stochastic maps, make.simmap can be slow. I've previously posted about how to parallelize stochastic mapping using the R package snow. For now, we're not going to worry about that, but we'll only generate 100 stochastic maps.

spine.trees<-make.simmap(lizard.tree,tail.spines,model="ARD",
    pi="fitzjohn",nsim=100)

## make.simmap is sampling character histories conditioned on
## the transition matrix
## 
## Q =
##              non-spiny        spiny
## non-spiny -0.001235890  0.001235890
## spiny      0.003977541 -0.003977541
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
##  non-spiny      spiny 
## 0.98834922 0.01165078

## Done.

spine.trees

## 100 phylogenetic trees with mapped discrete characters

Since the email was about visualizing stochastic mapping (not, for example, a summary of stochastic character histories, I'll first just demonstrate this, including showing the observed states at the tips of the trees as in the previous plot.

plot(spine.trees[[1]],col.morph,type="fan",ftype="off",
    lwd=1)
obj<-get("last_plot.phylo",envir=.PlotPhyloEnv)
n<-Ntip(lizard.tree)
col.morph<-setNames(palette()[c(4,2)],levels(tail.spines))
par(lend=3)
for(i in 1:Ntip(lizard.tree)){
    cc<-if(obj$xx[i]>0) 5 else -5
    th<-atan(obj$yy[i]/obj$xx[i])
    segments(obj$xx[i],obj$yy[i],
        obj$xx[i]+cc*cos(th),
        obj$yy[i]+cc*sin(th),lwd=4,
        col=col.morph[tail.spines[lizard.tree$tip.label[i]]])
}
legend("topleft",legend=levels(tail.spines),
    pch=15,col=col.morph,
    pt.cex=1.5,cex=0.8,bty="n")

plot of chunk unnamed-chunk-4

In addition to just visualizing a single tree, we can also show the posterior probability distribution across our full set of stochastic maps – and this also scales perfectly well to 600 taxa or more. To do that I'll use densityMap and then I'll adjust the color scheme to match our previous plot.

spine.dMap<-densityMap(spine.trees,plot=FALSE)

## sorry - this might take a while; please be patient

spine.dMap<-setMap(spine.dMap,col.morph)
plot(spine.dMap,type="fan",ftype="off",lwd=2,legend=FALSE,
    ylim=max(nodeHeights(lizard.tree))*c(-1,1.1))
add.color.bar(max(nodeHeights(lizard.tree)),spine.dMap$cols,
    title="PP(state = \"spiny\")",
    prompt=FALSE,lwd=6,outline=FALSE,x=0.95*par()$usr[1],
    y=0.9*par()$usr[4],subtitle="",fsize=0.8)

plot of chunk unnamed-chunk-5

Lastly, I'm going to graph our full tree, in a densityMap style, but then also add all of posterior probabilities at nodes, sizing the point size depending on the uncertainty at the node (such that more uncertain nodes with larger radius pies), as demonstrated in a previous post here. In this case, I'll plot any node with a maximum posterior probability ≤ 0.95 3 × larger than any node with a maximum posterior probability > 0.95.

One cool thing I'm going to do is sort the nodes from most to least certain as I plot them. That why the most ambiguous pie charts will not be obscured by less ambiguous ones on top!

pp<-summary(spine.trees)$ace[1:lizard.tree$Nnode,]
plot(spine.dMap,type="fan",ftype="off",lwd=1,legend=FALSE,
    ylim=max(nodeHeights(lizard.tree))*c(-1,1.1))
obj<-get("last_plot.phylo",envir=.PlotPhyloEnv)
n<-Ntip(lizard.tree)
col.morph<-setNames(palette()[c(4,2)],levels(tail.spines))
par(lend=3)
for(i in 1:Ntip(lizard.tree)){
    cc<-if(obj$xx[i]>0) 5 else -5
    th<-atan(obj$yy[i]/obj$xx[i])
    segments(obj$xx[i],obj$yy[i],
        obj$xx[i]+cc*cos(th),
        obj$yy[i]+cc*sin(th),lwd=4,
        col=col.morph[tail.spines[lizard.tree$tip.label[i]]])
}
add.color.bar(max(nodeHeights(lizard.tree)),spine.dMap$cols,
    title="PP(state = \"spiny\")",
    prompt=FALSE,lwd=6,outline=FALSE,x=0.95*par()$usr[1],
    y=0.9*par()$usr[4],subtitle="",fsize=0.8)
h<-max(nodeHeights(lizard.tree))
ii<-order(rowSums(pp^2),decreasing=TRUE)
for(i in ii){
    plotrix::floating.pie(obj$xx[i+Ntip(lizard.tree)],
        obj$yy[i+Ntip(lizard.tree)],
        radius=if(max(pp[i,])<=0.95) 0.03*h else 0.01*h,
        x=pp[i,],col=col.morph,border="transparent")
}
legend(x=0.98*par()$usr[1],y=0.9*par()$usr[4],
    levels(tail.spines),pch=16,col=col.morph,
    pt.cex=2,bty="n",cex=0.8)

plot of chunk unnamed-chunk-6

That's not too bad!

4 comments:

Tales MartinsMarch 15, 2023 at 1:55 PM
Hi Liam,

I am having a problem to run the densityMap() funciton.

I have a MultiSimmap file with 100 elements, and when I try to run the following error message returns:

Error in mapped.edge[i, names(maps[[i]])[j]] : subscript out of bounds

Do you know how can I solve this?

Thanks in advance!
Eduardo Juscamayta LópezFebruary 16, 2025 at 5:29 PM
Hi Liam,

I have been struggling to complete the analysis with SIMMAP. I used as input a phylogenetic tree based on recombinant-free SNP alignments of 95 K. pneumoniae isolates generated with IQ-TREE. I provided the Newick-formatted tree and converted it to Nexus format in R. Additionally, I initially had issues importing the metadata containing the traits I aim to analyze, such as the presence/absence of the KPC gene (carbapenem resistance) and point mutations in resistance genes like qnrB (quinolone resistance). Unfortunately, my results differ from what you have shown — the branch lengths vary significantly, and the trait colors appear like sun rays.

Would you be able to help me with these? I'd really appreciate any guidance! Thanks a lot.

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.

Phylogenetic Tools for Comparative Biology

Pages

Monday, July 25, 2022

Graphing the results of stochastic mapping with >500 taxa

4 comments: