I recently fielded a question about whether or not it was possible to use the polymorphic trait
evolution model, fitpolyMk
, to obtain posterior probabilities for internal nodes using the
technique of stochastic character mapping.
The answer is yes, and, in fact, I posted about this way back in 2019. Nonetheless, I thought it might be worth reiterating today.
For this example I’ll use habitat data for Mycalesina butterflies from a study by Halali et al (2020).
The butterfly species occur in three distinct habitat types – forest, fringe, and open – but can also occur in more than one type, something that we’ll code as polymorphism. We’re also going to assume that evolution of this character is ordered – that is to say:
library(phytools)
graph.polyMk(k=3,states=c("forest","fringe","open"),
model="ARD",ordered=TRUE)
title(main="\"Ordered\" polymorphic trait evolution model",
font.main=3)
These data feature in a problem set from my recent book with Luke Harmon, so they can be conveniently downloaded directly from the book website.
Let’s get ’em.
butterfly.tree<-read.nexus(file="http://www.phytools.org/Rbook/7/Mycalesina_phylogeny.nex")
butterfly.tree
##
## Phylogenetic tree with 311 tips and 310 internal nodes.
##
## Tip labels:
## Uni_unica, Bra_peitho_gigas, Bra_decira, Bra_eliasis, Bra_simonsii, Bra_teratia, ...
##
## Rooted; includes branch lengths.
butterfly.habitat<-read.csv(file="http://www.phytools.org/Rbook/7/Mycalesina_habitat.csv",
row.names=1,stringsAsFactors=TRUE)
head(butterfly.habitat)
## habitat
## Myc_francisca_formosana? forest+fringe+open
## Bic_cooksoni open
## Bic_brunnea forest
## Bic_jefferyi fringe+open
## Bic_auricruda_fulgida forest
## Bic_smithi_smithi forest+fringe
There are some differences between the taxa in the tree in the dataset. We can use the function
geiger::name.check
to resolve.
library(geiger)
chk<-name.check(butterfly.tree,butterfly.habitat)
summary(chk)
## 24 taxa are present in the tree but not the data:
## Bic_pareensis,
## Bra_eliasis,
## Bra_teratia,
## Cul_bisaya,
## Eno_anthedon,
## Eno_portlandia,
## ....
##
## To see complete list of mis-matched taxa, print object.
butterfly.tree<-drop.tip(butterfly.tree,
chk$tree_not_data)
name.check(butterfly.tree,butterfly.habitat)
## [1] "OK"
habitat<-setNames(butterfly.habitat[[1]],
rownames(butterfly.habitat))
Now let’s visualize our data. In this case, I will reproduce some previous code of my blog and show polymorphism at the tips of the tree using ‘pies.’
plotTree(butterfly.tree,ftype="off",lwd=1,type="fan")
X<-strsplit(setNames(as.character(habitat),names(habitat)),"+",
fixed=TRUE)
pies<-matrix(0,Ntip(butterfly.tree),3,
dimnames=list(butterfly.tree$tip.label,
c("forest","fringe","open")))
for(i in 1:Ntip(butterfly.tree))
pies[butterfly.tree$tip.label[i],
X[[butterfly.tree$tip.label[i]]]]<-
rep(1/length(X[[butterfly.tree$tip.label[i]]]),
length(X[[butterfly.tree$tip.label[i]]]))
cols<-c(rgb(0,1,0),rgb(0,0,1),rgb(1,0,0))
par(fg="transparent")
tiplabels(pie=pies,piecol=cols,cex=0.3)
par(fg="black")
legend(x="topleft",legend=c("forest","fringe","open"),
pt.cex=2,pch=16,col=cols)
(OK, I don’t like these colors either, but using rgb
is convenient for our purposes here for
when we make ‘intermediate’ colors later!)
Next, we can fit our polymorphic trait evolution model using fitpolyMk
.
Normally, we might first fit a set of models and compare them. Here, I will just use a standard
all-rates-different ("ARD"
) ordered model as follows.
butterfly.ordered<-fitpolyMk(butterfly.tree,habitat,
model="ARD",ordered=TRUE)
##
## This is the design matrix of the fitted model. Does it make sense?
##
## forest forest+fringe forest+fringe+open fringe fringe+open open
## forest 0 1 0 0 0 0
## forest+fringe 2 0 3 5 0 0
## forest+fringe+open 0 4 0 0 7 0
## fringe 0 6 0 0 9 0
## fringe+open 0 0 8 10 0 11
## open 0 0 0 0 12 0
butterfly.ordered
## Object of class "fitpolyMk".
##
## Evolution was modeled as 'ordered' (i.e., transitions are assumed
## to occur forest <-> forest+fringe <-> forest+fringe+open and so on) using the "ARD" model.
##
## Fitted (or set) value of Q:
## forest forest+fringe forest+fringe+open fringe fringe+open
## forest -0.034759 0.034759 0.000000 0.000000 0.000000
## forest+fringe 0.110911 -0.208582 0.020277 0.077395 0.000000
## forest+fringe+open 0.000000 0.088086 -0.088086 0.000000 0.000000
## fringe 0.000000 0.495208 0.000000 -1.153636 0.658427
## fringe+open 0.000000 0.000000 0.033355 0.000000 -0.099319
## open 0.000000 0.000000 0.000000 0.000000 0.000000
## open
## forest 0.000000
## forest+fringe 0.000000
## forest+fringe+open 0.000000
## fringe 0.000000
## fringe+open 0.065964
## open 0.000000
##
## Fitted (or set) value of pi:
## forest forest+fringe forest+fringe+open fringe
## 0.166667 0.166667 0.166667 0.166667
## fringe+open open
## 0.166667 0.166667
##
## Log-likelihood: -297.810038
##
## Optimization method used was "nlminb"
Let’s plot our fitted model.
plot(butterfly.ordered)
title(main=
"Fitted polymorphic trait evolution model\nfor buttefly habitat use",
font.main=3)
Alright, now, believe it or not, we’re basically ready to do stochastic mapping.
To take this next step, we first just pull out our data and fitted model from the "fitpolyMk"
object
as follows.
Q<-as.Qmatrix(butterfly.ordered)
print(Q,digits=3)
## Estimated Q matrix:
## forest forest+fringe forest+fringe+open fringe fringe+open open
## forest -0.0348 0.0348 0.0000 0.0000 0.0000 0.000
## forest+fringe 0.1109 -0.2086 0.0203 0.0774 0.0000 0.000
## forest+fringe+open 0.0000 0.0881 -0.0881 0.0000 0.0000 0.000
## fringe 0.0000 0.4952 0.0000 -1.1536 0.6584 0.000
## fringe+open 0.0000 0.0000 0.0334 0.0000 -0.0993 0.066
## open 0.0000 0.0000 0.0000 0.0000 0.0000 0.000
X<-butterfly.ordered$data
head(X)
## forest forest+fringe forest+fringe+open fringe fringe+open open
## Myc_francisca_formosana? 0 0 1 0 0 0
## Bic_cooksoni 0 0 0 0 0 1
## Bic_brunnea 1 0 0 0 0 0
## Bic_jefferyi 0 0 0 0 1 0
## Bic_auricruda_fulgida 1 0 0 0 0 0
## Bic_smithi_smithi 0 1 0 0 0 0
Then, we can go ahead and use these two objects as input for make.simmap
.
butterfly.maps<-make.simmap(butterfly.tree,x=X,Q=Q,
nsim=100)
## make.simmap is sampling character histories conditioned on
## the transition matrix
##
## Q =
## forest forest+fringe forest+fringe+open fringe fringe+open
## forest -0.03475871 0.03475871 0.00000000 0.00000000 0.00000000
## forest+fringe 0.11091079 -0.20858240 0.02027704 0.07739457 0.00000000
## forest+fringe+open 0.00000000 0.08808561 -0.08808561 0.00000000 0.00000000
## fringe 0.00000000 0.49520827 0.00000000 -1.15363566 0.65842739
## fringe+open 0.00000000 0.00000000 0.03335454 0.00000000 -0.09931852
## open 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
## open
## forest 0.00000000
## forest+fringe 0.00000000
## forest+fringe+open 0.00000000
## fringe 0.00000000
## fringe+open 0.06596398
## open 0.00000000
## (specified by the user);
## and (mean) root node prior probabilities
## pi =
## forest forest+fringe forest+fringe+open fringe
## 0.1666667 0.1666667 0.1666667 0.1666667
## fringe+open open
## 0.1666667 0.1666667
## Done.
butterfly.maps
## 100 phylogenetic trees with mapped discrete characters
(There’s actually another way to do this to. We can actually take the object index.matrix
from our
fitted model and then use this as our design matrix model
in make.simmap
! If we do it that way, we
can also use MCMC to sample the transition rates from their posterior distribution by setting Q="mcmc"
.)
Let’s now plot a single stochastic map – and I’ll overlay posterior probabilities for all internal nodes
using ape::nodelabels
.
map.cols<-setNames(c(rgb(0,1,0),rgb(0,0.5,0.5),rgb(1/3,1/3,1/3),
rgb(0,0,1),rgb(0.5,0.5,0),rgb(1,0,0)),levels(habitat))
plot(butterfly.maps[[1]],map.cols,ftype="off",
ylim=c(0,1.05*Ntip(butterfly.tree)))
legend(x="topleft",legend=levels(habitat),pt.cex=1.2,pch=15,
col=map.cols,cex=0.8,bty="n",ncol=2)
obj<-summary(butterfly.maps)
par(fg="transparent")
nodelabels(pie=obj$ace[1:butterfly.tree$Nnode,],piecol=map.cols,
cex=0.3)
par(fg="black")
Last of all, we can try something fun. Here, I’ll overlay, one by one, all the stochastic character mapped
histories in our sample. This is a visualization to approximate what can be obtained using phytools::densityMap
,
but for a multi-state character. (Depending on what device you’re rendering to, this can be very slow!)
par(lend=1)
for(i in 1:length(butterfly.maps))
plot(butterfly.maps[[i]],make.transparent(map.cols,
1/length(butterfly.maps)),add=if(i==1) FALSE else TRUE,
type="fan",ftype="off",lwd=3)
nodelabels(pie=obj$ace[1:butterfly.tree$Nnode,],piecol=map.cols,
cex=0.4)
par(fg="black")
legend(x="topleft",legend=levels(habitat),pt.cex=1.2,pch=15,
col=map.cols,cex=0.8,bty="n",ncol=2)