Thursday, December 8, 2022

Computing & visualizing the distribution of character transitions from a set of stochastic character mapped trees

I recently fielded a question about how, having fit an Mk discrete character evolution model and recontructed ancestral states, one might go about using this reconstruction to study variation in the rate of trait evolution from branch to branch in the phylogeny.

In point of fact, when we fit the Mk model in the first place, we had assumed that the evolutionary rate (or, specifically, our transition matrix Q) was homogeneous (that is, unchanging) across all the edges of the phylogeny.

Still, I believe that it is nonetheless quite reasonable to investigate the distribution of state changes across the tree – and one very handy technique for this is the method of stochastic character mapping (Huelsenbeck et al. 2003).

To illustrate what I mean by this, I will use two different datasets – one for eels (from Collar et al. 2014), and a second for primates (from Kirk & Kay 2004). Both can be obtained from the website of my recent Princeton University Press book with Luke Harmon.

Let’s start with the eel tree & dataset.

library(phytools)
eel.tree<-read.tree(file="http://www.phytools.org/Rbook/8/elopomorph.tre")
eel.data<-read.csv(file="http://www.phytools.org/Rbook/8/elopomorph.csv",
	row.names=1,stringsAsFactors=TRUE)
head(eel.data)
##                   feed_mode Max_TL_cm
## Albula_vulpes       suction       104
## Anguilla_anguilla   suction        50
## Anguilla_bicolor    suction       120
## Anguilla_japonica   suction       150
## Anguilla_rostrata   suction       152
## Ariosoma_anago      suction        60

We’re going to analyze the trait feed_mode: a binary ‘feeding mode’ character.

fmode<-setNames(eel.data$feed_mode,rownames(eel.data))
levels(fmode)
## [1] "bite"    "suction"

Now let’s run stochastic mapping. In this case, from exercises in our book, I happen to recall that the "ER" (equal-rates) model is the one that is best supported by the data.

fmode.trees<-make.simmap(eel.tree,fmode,model="ER",nsim=1000,
	pi="fitzjohn")
## make.simmap is sampling character histories conditioned on
## the transition matrix
## 
## Q =
##                bite     suction
## bite    -0.01569437  0.01569437
## suction  0.01569437 -0.01569437
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
##      bite   suction 
## 0.5491439 0.4508561
## Done.
fmode.trees
## 1000 phylogenetic trees with mapped discrete characters

Normally, we might be concerned with the probability that each edge was in each of the two mapped character states, and, if we were, we could easily visualize that using the phytools method densityMap. In this case we will follow two other recent posts to this blog (1, 2) and re-color our plot using the ‘viridis’ color palette.

dMap.eels<-densityMap(fmode.trees,plot=FALSE,res=500)
## sorry - this might take a while; please be patient
pal<-viridisLite::viridis(n=20)
dMap.eels<-setMap(dMap.eels,pal)
plot(dMap.eels,fsize=c(0.6,0.9),lwd=4,legend=50)
obj<-summary(fmode.trees)
nn<-which(apply(obj$ace,1,max)<0.95)
nodelabels(node=as.numeric(names(nn)),
	pie=obj$ace[nn,],piecol=pal[c(1,20)],cex=0.6)

plot of chunk unnamed-chunk-4

OK, that’s pretty cool.

Next, what we’re going to do is simply count the number of changes along each edge of each tree. We’ll ignore the specific type of each change in favor of obtaining a simpler tally.

For whatever reason (and as far as I can recall) there is no function that automates this counting; however, it’s not hard for us to take advantage of the structure of the "simmap" object in which the element "maps" just contains a list of all the states (in the order of occurence) on each edge of the tree. If we measure the length of each element of "maps" (minus one), this gives us the number of changes on each tree & edge!

foo<-function(x) sapply(x$maps,function(e) length(e)-1)
N<-sapply(fmode.trees,foo)

This new matrix contains the number of changes on each edge (in rows) and tree (in columns) from our stochastic maps.

Nbar<-rowMeans(N)
Nbar
##   [1] 0.221 1.320 0.104 0.172 0.021 0.093 0.866 0.002 1.102 0.000 0.002 0.200 0.556
##  [14] 0.085 0.170 0.633 0.050 0.008 0.004 0.026 0.042 0.050 0.559 0.108 0.045 0.308
##  [27] 0.312 0.150 0.357 0.807 0.075 0.038 0.149 0.957 0.142 0.365 0.060 0.070 0.086
##  [40] 0.084 0.493 0.218 0.095 0.594 0.504 0.649 0.479 0.069 0.137 0.316 0.404 0.700
##  [53] 0.288 0.016 0.012 0.323 0.041 0.036 0.038 0.097 0.194 0.147 0.210 0.312 0.081
##  [66] 0.019 0.036 0.040 0.198 0.393 0.641 0.621 0.660 0.534 0.653 0.501 0.132 0.060
##  [79] 0.114 0.130 0.023 0.027 0.154 0.180 0.063 0.016 0.026 0.000 0.007 0.011 0.007
##  [92] 0.032 0.086 0.130 0.052 0.070 0.180 0.146 0.840 0.187 0.023 0.021 0.282 0.596
## [105] 0.244 0.096 0.082 0.207 0.117 0.111 0.008 0.008 0.702 1.218 1.168 0.736 0.418
## [118] 0.379 0.343 0.821

At last, we have a vector containing (for each edge) the mean number of changes across all maps.

Possibly the easiest way to visualize this result would be to use the phytools function plotBranchbyTrait, which is actually a simple ape::plot.phylo wrapper, as follows.

plotBranchbyTrait(eel.tree,Nbar,mode="edges",cex=c(0.5,0.8),
	legend=50,title="mean number of changes")

plot of chunk unnamed-chunk-7

I like it but I don’t love it. One thing that jumps out is that long edges tend to have more changes, just as one would expect.

For fun, let’s try mapping the number of changes per unit time onto the edge of the tree, and then visualizing these values on a logarithmic scale. I’m also going to graph my tree using the sigmoidPhylogram function that I recently added to the phytools package.

logNt<-log(Nbar/eel.tree$edge.length)
logNt[logNt==-Inf]<-NA
cols<-setNames(c("grey",viridisLite::viridis(486)),0:486)
levs<-round((logNt-min(logNt,na.rm=TRUE))/diff(range(logNt,
	na.rm=TRUE))*(length(cols)-2)+1)
levs[is.na(levs)]<-0
lnNt.tree<-eel.tree
for(i in 1:length(levs)) 
	lnNt.tree<-paintBranches(lnNt.tree,edge=eel.tree$edge[i,2],
		state=levs[i])
sigmoidPhylogram(eel.tree,fsize=0.6,ftype="i",lwd=5,direction="upwards",
	xlim=c(-5,Ntip(eel.tree)))
par(fg="transparent")
sigmoidPhylogram(lnNt.tree,colors=cols,fsize=0.6,ftype="i",lwd=3,
	direction="upwards",xlim=c(-5,Ntip(eel.tree)),add=TRUE)
par(fg="black")
h<-max(nodeHeights(eel.tree))
lines(rep(-3,2),c(0.1*h,h))
nticks<-4
for(i in 1:nticks){
	yy<-0.1*h+(i-1)/(nticks-1)*(0.9*h)
	lines(c(-3,-4),rep(yy,2))
	text(-4,yy,round(exp(min(logNt,na.rm=TRUE)+
		(i-1)/(nticks-1)*diff(range(logNt,na.rm=TRUE))),
		5),pos=2,cex=0.6,offset=0.1)
}
add.color.bar(0.9*h,cols[2:length(cols)],direction="upwards",lims=NULL,digits=3,
	lwd=15,x=-3,y=0.1*h,prompt=FALSE,title="",subtitle="")
text(-3.5,0,"mean number of\nchanges / unit of time",pos=4,cex=0.8,
	offset=0)

plot of chunk unnamed-chunk-8

For comparison, let’s try a different scenario in which the data are consistent with a much smaller number of changes over the tree: diel activity pattern in primates.

primate.tree<-read.tree(file="http://www.phytools.org/Rbook/3/primateEyes.phy")
primate.data<-read.csv(file="http://www.phytools.org/Rbook/3/primateEyes.csv",
	row.names=1,stringsAsFactors=TRUE)
activity<-setNames(primate.data$Activity_pattern,rownames(primate.data))
head(activity)
## Allenopithecus_nigroviridis           Alouatta_palliata 
##                     Diurnal                     Diurnal 
##          Alouatta_seniculus           Aotus_trivirgatus 
##                     Diurnal                   Nocturnal 
##           Arctocebus_aureus     Arctocebus_calabarensis 
##                   Nocturnal                   Nocturnal 
## Levels: Cathemeral Diurnal Nocturnal

In this case, I don’t know which model is best-supported, so I’ll try fitting the ER (equal-rates), SYM (symmetric), and ARD models, and then I’ll compare them using the phytools anova method.

ER<-fitMk(primate.tree,activity,model="ER",pi="fitzjohn")
SYM<-fitMk(primate.tree,activity,model="SYM",pi="fitzjohn")
ARD<-fitMk(primate.tree,activity,model="ARD",pi="fitzjohn")
anova(ER,SYM,ARD)
##        log(L) d.f.      AIC     weight
## ER  -33.42659    1 68.85318 0.47738258
## SYM -31.43569    3 68.87137 0.47305890
## ARD -30.69175    6 73.38350 0.04955853

This indicates that the best-supported model, taking into account parameterization, is the equal-rates model (although the SYM model is very similar). Let’s use the ER model here.

activity.trees<-make.simmap(primate.tree,activity,model="ER",
	pi="fitzjohn",nsim=1000)
## make.simmap is sampling character histories conditioned on
## the transition matrix
## 
## Q =
##              Cathemeral      Diurnal    Nocturnal
## Cathemeral -0.006272385  0.003136192  0.003136192
## Diurnal     0.003136192 -0.006272385  0.003136192
## Nocturnal   0.003136192  0.003136192 -0.006272385
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
##  Cathemeral     Diurnal   Nocturnal 
## 0.007544215 0.130104003 0.862351782
## Done.

Now we can repeat the analysis we did above. In this case I will just give the mean number of change per branch, rather than by unit time.

N<-sapply(activity.trees,foo)
Nbar<-rowMeans(N)
cols<-setNames(colorRampPalette(c("lightblue","red"))(256),1:256)
levs<-round((Nbar-min(Nbar))/diff(range(Nbar))*(length(cols)-2)+1)
Nbar.tree<-primate.tree
for(i in 1:length(levs)) 
	Nbar.tree<-paintBranches(Nbar.tree,edge=primate.tree$edge[i,2],
		state=levs[i])
sigmoidPhylogram(primate.tree,fsize=0.6,ftype="i",lwd=5)
par(fg="transparent")
sigmoidPhylogram(Nbar.tree,colors=cols,fsize=0.6,ftype="i",lwd=3,add=TRUE)
par(fg="black")
add.color.bar(40,cols,outline=TRUE,title="mean number of changes",
	lims=range(Nbar),subtitle="from 1,000 stochastic maps",
	fsize=0.8,digits=2,x=0,y=5,prompt=FALSE)

plot of chunk unnamed-chunk-12

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.