A colleague recently asked if it was possible to visualize a "rateshift"
class object with a custom color gradient legend range, as well as without showing the color bar on each plot. This makes sense as a request because we may find ourselves comparing different rate shift scenarios with the same tree & data (e.g., 1, 2 or more shifts), so it is sensible that the same colors correspond to the same quantitative trait rates across each subplot.
As a quick reminder, the phytools function rateshift
fits a model in which the Brownian evolutionary rate, \(\sigma^2\), shifts in one or more temporal locations on the tree, estimated using maximum likelihood. This fitted model can be compared for different numbers of shifts (each of which consumes one degree of freedom), to homogeneous rate Brownian motion, or to other evolutionary scenarios for our trait. If we set the function variable nrates = 1
, we’re fitting a standard constant-rate model.
I have now updated phytools as per this very sensible user request, and this update can be obtained by installing phytools from GitHub using (e.g.) the remotes CRAN package.
Let’s test it out.
First, we can load phytools.
library(phytools)
packageVersion("phytools")
## [1] '2.3.12'
(This is the GitHub development version of phytools, as of the time of writing this post.)
Next, let’s simulate some data with two rate shifts: one decrease in \(\sigma^2\) 70% of the total tree depth above the root; and a second increase in \(\sigma^2\) 90% above the root.
tree<-pbtree(n=100,scale=100)
sig2<-setNames(c(20,1,10),1:3)
tt<-make.era.map(tree,setNames(c(0,70,90),1:3))
x<-sim.rates(tt,sig2,internal=TRUE)
Here’s our simulation with the known regime & simulated internal node states included.
cols<-setNames(hcl.colors(n=20)[c(sig2)],
names(sig2))
par(mar=c(5.1,4.1,1.1,1.1),lend=1)
phenogram(tt,x,colors=cols,ftype="off",las=1,
cex.axis=0.8)
legend("topleft",legend=
c(expression(paste(sigma^2,"= 20")),
expression(paste(sigma^2,"= 1")),
expression(paste(sigma^2,"= 10"))),
lty="solid",col=cols,bty="n",lwd=2,cex=0.8)
Now let’s proceed to run our analysis. Of course, for this we’ll use only the tip states of the character & our unmapped tree. (That’s the whole point: we want to see if we can correctly infer the known rate shift locations!)
## just the tip states
x<-x[tree$tip.label]
## no rate shifts
fit1<-rateshift(tree,x,nrates=1)
## Optimization progress:
## |..........|
## Done.
fit1
## ML 1-rate model:
## s^2(1) se(1) k logL
## value 11.59 1.639 2 -400.6
##
## This is a one-rate model.
##
## Model fit using ML.
##
## Frequency of best fit: 1
##
## Optimization may not have converged.
We’re warned that optimization may not have converged, but it’s easy to double-check that this is the same fitted, one-rate model we might have obtained using, say, geiger::fitContinuous
.
geiger::fitContinuous(tree,x)
## GEIGER-fitted comparative model of continuous data
## fitted 'BM' model parameters:
## sigsq = 11.591104
## z0 = -6.243436
##
## model summary:
## log-likelihood = -400.584317
## AIC = 805.168634
## AICc = 805.292345
## free parameters = 2
##
## Convergence diagnostics:
## optimization iterations = 100
## failed iterations = 0
## number of iterations with same best fit = 100
## frequency of best fit = 1.000
##
## object summary:
## 'lik' -- likelihood function
## 'bnd' -- bounds for likelihood search
## 'res' -- optimization iteration summary
## 'opt' -- maximum likelihood parameter estimates
Next, our actual rate-shift models….
## one rate shift
fit2<-rateshift(tree,x,nrates=2,parallel=TRUE)
## Opened cluster with 10 cores.
## Running optimization iterations in parallel.
## Please wait....
## Done.
fit2
## ML 2-rate model:
## s^2(1) se(1) s^2(2) se(2) k logL
## value 25.02 12.66 10.29 1.6 4 -399.3
##
## Shift point(s) between regimes (height above root):
## 1|2 se(1|2)
## value 67.11 0.25
##
## Model fit using ML.
##
## Frequency of best fit: 0.3
##
## R thinks it has found the ML solution.
(The option parallel=TRUE
distributes our different optimization iterations across cores.)
## two rate shifts: this is the generating condition
fit3<-rateshift(tree,x,nrates=3,parallel=TRUE)
## Opened cluster with 10 cores.
## Running optimization iterations in parallel.
## Please wait....
## Done.
fit3
## ML 3-rate model:
## s^2(1) se(1) s^2(2) se(2) s^2(3) se(3) k logL
## value 30.67 12.06 0.0012 7.083 12.43 2.545 6 -394.3
##
## Shift point(s) between regimes (height above root):
## 1|2 se(1|2) 2|3 se(2|3)
## value 69.63 4.181 89.15 3.372
##
## Model fit using ML.
##
## Frequency of best fit: 0.2
##
## R thinks it has found the ML solution.
## three rate shifts: this is too many shifts
fit4<-rateshift(tree,x,nrates=4,parallel=TRUE)
## Opened cluster with 10 cores.
## Running optimization iterations in parallel.
## Please wait....
## Done.
fit4
## ML 4-rate model:
## s^2(1) se(1) s^2(2) se(2) s^2(3) se(3) s^2(4) se(4) k logL
## value 14.65 10.35 71.22 41.25 0.0012 1.931 12.38 2.526 8 -392.9
##
## Shift point(s) between regimes (height above root):
## 1|2 se(1|2) 2|3 se(2|3) 3|4 se(3|4)
## value 58.5 0.1612 66.75 1.457 89.08 2.808
##
## Model fit using ML.
##
## Frequency of best fit: 0.5
##
## R thinks it has found the ML solution.
## compare models
aov_table<-anova(fit1,fit2,fit3,fit4)
## log(L) d.f. AIC weight
## object -400.5843 2 805.1686 0.06155936
## fit2 -399.3082 4 806.6165 0.02984666
## fit3 -394.3491 6 800.6982 0.57550162
## fit4 -392.8959 8 801.7918 0.33309236
This tells us that our three-rate (i.e., two rate shift) model is the best-supported among the four.
Let’s plot all our fitted models together in a multi-plot figure using the new plotting options of the plot.rateshift
S3 method.
## find range of estimated sigma^2
sigs<-unlist(sapply(list(fit1,fit2,fit3,fit4),
function(x) x$sig2))
par(mfrow=c(2,2))
cols<-hcl.colors(n=10)
plot(fit1,lims=range(sigs),col=cols,ftype="off",
legend=FALSE)
mtext(paste("a) single-rate (weight = ",
round(aov_table$weight[1],3),")",sep=""),
adj=0,line=-1)
plot(fit2,lims=range(sigs),col=cols,ftype="off",
legend=FALSE)
mtext(paste("b) two-rate (weight = ",
round(aov_table$weight[2],3),")",sep=""),
adj=0,line=-1)
plot(fit3,lims=range(sigs),col=cols,ftype="off",
legend=FALSE)
mtext(paste("c) three-rate (weight = ",
round(aov_table$weight[3],3),")",sep=""),
adj=0,line=-1)
plot(fit4,lims=range(sigs),col=cols,ftype="off",
legend=TRUE)
mtext(paste("d) four-rate (weight = ",
round(aov_table$weight[4],3),")",sep=""),
adj=0,line=-1)
This doesn’t look great – largely because our color gradient for the rate is totally distorted by the one spuriously high rate in our (poorly-supported) four rate model!
Let’s try it again, but in this case constraining our color gradient to be \(0 < \sigma^2 < 20\). I’m also going to use base R graphics layout
to have my single color legend in a separate panel.
## find range of estimated sigma^2
layout(matrix(c(1,1,2,2,3,3,4,4,6,5,5,6),3,4,
byrow=TRUE),heights=c(0.475,0.475,0.05))
cols<-hcl.colors(n=10)
plot(fit1,lims=c(0,20),col=cols,ftype="off",
legend=FALSE,lwd=3)
mtext(paste("a) single-rate (weight = ",
round(aov_table$weight[1],3),")",sep=""),
adj=0,line=-1)
plot(fit2,lims=c(0,20),col=cols,ftype="off",
legend=FALSE,lwd=3)
mtext(paste("b) two-rate (weight = ",
round(aov_table$weight[2],3),")",sep=""),
adj=0,line=-1)
plot(fit3,lims=c(0,20),col=cols,ftype="off",
legend=FALSE,lwd=3)
mtext(paste("c) three-rate (weight = ",
round(aov_table$weight[3],3),")",sep=""),
adj=0,line=-1)
plot(fit4,lims=c(0,20),col=cols,ftype="off",
legend=FALSE,lwd=3)
mtext(paste("d) four-rate (weight = ",
round(aov_table$weight[4],3),")",sep=""),
adj=0,line=-1)
par(mar=rep(0,4))
plot(NA,xlim=c(0,100),ylim=c(-1,1),bty="n",
xlab="",ylab="",axes=FALSE)
add.color.bar(80,hcl.colors(n=100),
title=expression(sigma^2),prompt=FALSE,
x=10,y=0,lims=NULL,lwd=6,fsize=1.5,
subtitle="")
text(8,0,"0",pos=2,cex=1.5)
text(92,0,"> 20",pos=4,cex=1.5)
That’s not bad, right?