I recently fielded a question about how, having fit an Mk discrete character evolution model and recontructed ancestral states, one might go about using this reconstruction to study variation in the rate of trait evolution from branch to branch in the phylogeny.
In point of fact, when we fit the Mk model in the first place, we had assumed that the evolutionary rate (or, specifically, our transition matrix Q) was homogeneous (that is, unchanging) across all the edges of the phylogeny.
Still, I believe that it is nonetheless quite reasonable to investigate the distribution of state changes across the tree – and one very handy technique for this is the method of stochastic character mapping (Huelsenbeck et al. 2003).
To illustrate what I mean by this, I will use two different datasets – one for eels (from Collar et al. 2014), and a second for primates (from Kirk & Kay 2004). Both can be obtained from the website of my recent Princeton University Press book with Luke Harmon.
Let’s start with the eel tree & dataset.
library(phytools)
eel.tree<-read.tree(file="http://www.phytools.org/Rbook/8/elopomorph.tre")
eel.data<-read.csv(file="http://www.phytools.org/Rbook/8/elopomorph.csv",
row.names=1,stringsAsFactors=TRUE)
head(eel.data)
## feed_mode Max_TL_cm
## Albula_vulpes suction 104
## Anguilla_anguilla suction 50
## Anguilla_bicolor suction 120
## Anguilla_japonica suction 150
## Anguilla_rostrata suction 152
## Ariosoma_anago suction 60
We’re going to analyze the trait feed_mode
: a binary ‘feeding mode’ character.
fmode<-setNames(eel.data$feed_mode,rownames(eel.data))
levels(fmode)
## [1] "bite" "suction"
Now let’s run stochastic mapping. In this case, from exercises in our
book,
I happen to recall that the "ER"
(equal-rates) model is the one that is
best supported by the data.
fmode.trees<-make.simmap(eel.tree,fmode,model="ER",nsim=1000,
pi="fitzjohn")
## make.simmap is sampling character histories conditioned on
## the transition matrix
##
## Q =
## bite suction
## bite -0.01569437 0.01569437
## suction 0.01569437 -0.01569437
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
## bite suction
## 0.5491439 0.4508561
## Done.
fmode.trees
## 1000 phylogenetic trees with mapped discrete characters
Normally, we might be concerned with the probability that each edge was in
each of the two mapped character states, and, if we were, we could easily
visualize that using the phytools method densityMap
. In this case we
will follow two other recent posts to this blog
(1,
2)
and re-color our plot using the ‘viridis’ color palette.
dMap.eels<-densityMap(fmode.trees,plot=FALSE,res=500)
## sorry - this might take a while; please be patient
pal<-viridisLite::viridis(n=20)
dMap.eels<-setMap(dMap.eels,pal)
plot(dMap.eels,fsize=c(0.6,0.9),lwd=4,legend=50)
obj<-summary(fmode.trees)
nn<-which(apply(obj$ace,1,max)<0.95)
nodelabels(node=as.numeric(names(nn)),
pie=obj$ace[nn,],piecol=pal[c(1,20)],cex=0.6)
OK, that’s pretty cool.
Next, what we’re going to do is simply count the number of changes along each edge of each tree. We’ll ignore the specific type of each change in favor of obtaining a simpler tally.
For whatever reason (and as far as I can recall) there is no function that automates
this counting; however, it’s not hard for us to take advantage of the structure of the
"simmap"
object in which the element "maps"
just contains a list of all the states
(in the order of occurence) on each edge of the tree. If we measure the length of
each element of "maps"
(minus one), this gives us the number of changes on each
tree & edge!
foo<-function(x) sapply(x$maps,function(e) length(e)-1)
N<-sapply(fmode.trees,foo)
This new matrix contains the number of changes on each edge (in rows) and tree (in columns) from our stochastic maps.
Nbar<-rowMeans(N)
Nbar
## [1] 0.221 1.320 0.104 0.172 0.021 0.093 0.866 0.002 1.102 0.000 0.002 0.200 0.556
## [14] 0.085 0.170 0.633 0.050 0.008 0.004 0.026 0.042 0.050 0.559 0.108 0.045 0.308
## [27] 0.312 0.150 0.357 0.807 0.075 0.038 0.149 0.957 0.142 0.365 0.060 0.070 0.086
## [40] 0.084 0.493 0.218 0.095 0.594 0.504 0.649 0.479 0.069 0.137 0.316 0.404 0.700
## [53] 0.288 0.016 0.012 0.323 0.041 0.036 0.038 0.097 0.194 0.147 0.210 0.312 0.081
## [66] 0.019 0.036 0.040 0.198 0.393 0.641 0.621 0.660 0.534 0.653 0.501 0.132 0.060
## [79] 0.114 0.130 0.023 0.027 0.154 0.180 0.063 0.016 0.026 0.000 0.007 0.011 0.007
## [92] 0.032 0.086 0.130 0.052 0.070 0.180 0.146 0.840 0.187 0.023 0.021 0.282 0.596
## [105] 0.244 0.096 0.082 0.207 0.117 0.111 0.008 0.008 0.702 1.218 1.168 0.736 0.418
## [118] 0.379 0.343 0.821
At last, we have a vector containing (for each edge) the mean number of changes across all maps.
Possibly the easiest way to visualize this result would be to use the phytools
function plotBranchbyTrait
, which is actually a simple ape::plot.phylo
wrapper,
as follows.
plotBranchbyTrait(eel.tree,Nbar,mode="edges",cex=c(0.5,0.8),
legend=50,title="mean number of changes")
I like it but I don’t love it. One thing that jumps out is that long edges tend to have more changes, just as one would expect.
For fun, let’s try mapping the number of changes per unit time onto the edge of the
tree, and then visualizing these values on a logarithmic scale. I’m also going to
graph my tree using the
sigmoidPhylogram
function that I recently added to the phytools package.
logNt<-log(Nbar/eel.tree$edge.length)
logNt[logNt==-Inf]<-NA
cols<-setNames(c("grey",viridisLite::viridis(486)),0:486)
levs<-round((logNt-min(logNt,na.rm=TRUE))/diff(range(logNt,
na.rm=TRUE))*(length(cols)-2)+1)
levs[is.na(levs)]<-0
lnNt.tree<-eel.tree
for(i in 1:length(levs))
lnNt.tree<-paintBranches(lnNt.tree,edge=eel.tree$edge[i,2],
state=levs[i])
sigmoidPhylogram(eel.tree,fsize=0.6,ftype="i",lwd=5,direction="upwards",
xlim=c(-5,Ntip(eel.tree)))
par(fg="transparent")
sigmoidPhylogram(lnNt.tree,colors=cols,fsize=0.6,ftype="i",lwd=3,
direction="upwards",xlim=c(-5,Ntip(eel.tree)),add=TRUE)
par(fg="black")
h<-max(nodeHeights(eel.tree))
lines(rep(-3,2),c(0.1*h,h))
nticks<-4
for(i in 1:nticks){
yy<-0.1*h+(i-1)/(nticks-1)*(0.9*h)
lines(c(-3,-4),rep(yy,2))
text(-4,yy,round(exp(min(logNt,na.rm=TRUE)+
(i-1)/(nticks-1)*diff(range(logNt,na.rm=TRUE))),
5),pos=2,cex=0.6,offset=0.1)
}
add.color.bar(0.9*h,cols[2:length(cols)],direction="upwards",lims=NULL,digits=3,
lwd=15,x=-3,y=0.1*h,prompt=FALSE,title="",subtitle="")
text(-3.5,0,"mean number of\nchanges / unit of time",pos=4,cex=0.8,
offset=0)
For comparison, let’s try a different scenario in which the data are consistent with a much smaller number of changes over the tree: diel activity pattern in primates.
primate.tree<-read.tree(file="http://www.phytools.org/Rbook/3/primateEyes.phy")
primate.data<-read.csv(file="http://www.phytools.org/Rbook/3/primateEyes.csv",
row.names=1,stringsAsFactors=TRUE)
activity<-setNames(primate.data$Activity_pattern,rownames(primate.data))
head(activity)
## Allenopithecus_nigroviridis Alouatta_palliata
## Diurnal Diurnal
## Alouatta_seniculus Aotus_trivirgatus
## Diurnal Nocturnal
## Arctocebus_aureus Arctocebus_calabarensis
## Nocturnal Nocturnal
## Levels: Cathemeral Diurnal Nocturnal
In this case, I don’t know which model is best-supported, so I’ll try fitting
the ER (equal-rates), SYM (symmetric), and ARD models, and then I’ll compare
them using the phytools anova
method.
ER<-fitMk(primate.tree,activity,model="ER",pi="fitzjohn")
SYM<-fitMk(primate.tree,activity,model="SYM",pi="fitzjohn")
ARD<-fitMk(primate.tree,activity,model="ARD",pi="fitzjohn")
anova(ER,SYM,ARD)
## log(L) d.f. AIC weight
## ER -33.42659 1 68.85318 0.47738258
## SYM -31.43569 3 68.87137 0.47305890
## ARD -30.69175 6 73.38350 0.04955853
This indicates that the best-supported model, taking into account parameterization, is the equal-rates model (although the SYM model is very similar). Let’s use the ER model here.
activity.trees<-make.simmap(primate.tree,activity,model="ER",
pi="fitzjohn",nsim=1000)
## make.simmap is sampling character histories conditioned on
## the transition matrix
##
## Q =
## Cathemeral Diurnal Nocturnal
## Cathemeral -0.006272385 0.003136192 0.003136192
## Diurnal 0.003136192 -0.006272385 0.003136192
## Nocturnal 0.003136192 0.003136192 -0.006272385
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
## Cathemeral Diurnal Nocturnal
## 0.007544215 0.130104003 0.862351782
## Done.
Now we can repeat the analysis we did above. In this case I will just give the mean number of change per branch, rather than by unit time.
N<-sapply(activity.trees,foo)
Nbar<-rowMeans(N)
cols<-setNames(colorRampPalette(c("lightblue","red"))(256),1:256)
levs<-round((Nbar-min(Nbar))/diff(range(Nbar))*(length(cols)-2)+1)
Nbar.tree<-primate.tree
for(i in 1:length(levs))
Nbar.tree<-paintBranches(Nbar.tree,edge=primate.tree$edge[i,2],
state=levs[i])
sigmoidPhylogram(primate.tree,fsize=0.6,ftype="i",lwd=5)
par(fg="transparent")
sigmoidPhylogram(Nbar.tree,colors=cols,fsize=0.6,ftype="i",lwd=3,add=TRUE)
par(fg="black")
add.color.bar(40,cols,outline=TRUE,title="mean number of changes",
lims=range(Nbar),subtitle="from 1,000 stochastic maps",
fsize=0.8,digits=2,x=0,y=5,prompt=FALSE)
No comments:
Post a Comment
Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.