A few days ago, I shared a new variable rate Brownian motion evolution model for continuous traits on the tree.
In tests, the method performed pretty
well, and I wrote a nice
S3 plot
method that helped a tweet
about the method go 'viral' (or at least as viral as such things do).
New variable-rate quantitative trait evolution model in #Rstats #phytools. Check out my blog for more details: https://t.co/tV2Xwmhf4k. pic.twitter.com/so2A28GljB
— Liam Revell (@phytools_liam) March 29, 2021
Well, as it turns out things are a bit more complicated.
In particular, the model I was using actually ends up being non-identifiable if we expect to estimate both the Brownian motion rates of evolution (σi2) along all the edges of the tree, as well as the rate of Brownian evolution of the rate itself (let's call it σBM2).
What we can do very well, though, is estimate the rates under different scenarios for rate heterogeneity, if we assign a penalty term, λ, to the heterogeneity in rate among edges. This approach (of optimizing a problem using likelihood, but in which model complexity is penalized to an extent that's determined by the user) is called penalized likelihood, and it makes sense that it would work here because it is used in other rate-smoothing methods, including for phylogenetics. In this case, the penalty function that I'm using is the probability density of the rates σi2 on the tree. Not considering the data, this function will always have a maximum value with all rates equal!
A low value of the penalty term (say, λ = 0.01) accords very little penalty to rate variation among edges. On the other hand, a higher value (like, λ = 100) allows the rate to vary little. Intermediate λ (e.g., 1) gives “equal” weight to the probability of the data under the model, and the probability of the rate variation among edges, given a Brownian evolution process for rate variation.
Updated code for the function multirateBM
is here.
Let's simulate some data and see how it works. In this case what I'm doing is simulating node states for the log-rate under Brownian motion, transforming the edges of my tree by the values at the subtending nodes, and then simulating data on the transformed tree.
library(phytools)
tree<-pbtree(n=50)
sig2<-exp(fastBM(tree,internal=TRUE))
tt<-tree
tt$edge.length<-tt$edge.length*apply(tt$edge,1,
function(e,x) mean(x[e]),x=sig2)
x<-fastBM(tt)
I'm actually going to “steal” the plot
method for my "multirateBM"
object class to visualize how this
rate variation is distributed under on the tree in the generating model as follows:
object<-list()
class(object)<-"multirateBM"
object$sig2<-sig2
object$tree<-tree
plot(object,ftype="off",digits=4,mar=c(1.1,1.1,4.1,1.1))
title(main=expression(paste("a) True rates (",
sigma[BM]^2,")")),adj=0.1)
Now let's fit our model – starting with a λ penalty coefficient of 1 – and see how the estimated rates compare.
fit.ml<-multirateBM(tree,x,lambda=1)
## Beginning optimization....
## Optimization iteration 1. Using "Nelder-Mead" optimization method.
## Optimization iteration 2. Using "BFGS" optimization method.
## Done optimization.
fit.ml
## Multi-rate Brownian model using multirateBM.
##
## Fitted rates:
## t35 t36 t4 t45 t46 t49 t50 t6
## 0.076963 0.077526 0.076633 0.059895 0.059905 0.15573 0.15573 0.075023 ....
##
## lambda penalty term: 1
## log-likelihood: -104.205263
## AIC: 408.410527
##
## R thinks it has found a solution.
plot(fit.ml,ftype="off",digits=4,mar=c(1.1,1.1,4.1,1.1))
title(main=expression(paste("b) Estimated rates (",
sigma[BM]^2,"), for ",lambda," = 1.0")),adj=0.1)
Since we know the true values of σ2 for each edge, we can easily check to see how well-correlated our estimated values are with our known, true rates:
options(scipen=999)
## generating sigma^2 by edge
SIG2<-apply(tree$edge,1,function(e,x)
mean(x[e]),x=sig2)
## estimated sigma^2 by edge
est.SIG2<-apply(tree$edge,1,function(e,x)
mean(x[e]),x=fit.ml$sig2)
xylim<-range(c(SIG2,est.SIG2))
par(mar=c(5.1,5.1,4.1,1.1))
plot(SIG2,est.SIG2,pch=21,bg="grey",cex=1.5,bty="n",
xlab=expression(paste("true rates, ",sigma^2)),
ylab=expression(paste("estimates rates, ",sigma^2)),
xlim=xylim,ylim=xylim,log="xy",las=1,cex.axis=0.8)
lines(xylim,xylim,lwd=2,
col=make.transparent(palette()[4],0.5))
title(main=expression(paste(
"Simulated vs. Estimated Brownian Rates, ",
lambda," = 1.0")))
Now let's try again, but this time with λ = 0.01, to see what happens. We should expect our model to exhibit greater variation in the Brownian rate, σ2, among edges.
fit.ml2<-multirateBM(tree,x,lambda=0.01)
## Beginning optimization....
## Optimization iteration 1. Using "Nelder-Mead" optimization method.
## Optimization iteration 2. Using "BFGS" optimization method.
## Done optimization.
fit.ml2
## Multi-rate Brownian model using multirateBM.
##
## Fitted rates:
## t35 t36 t4 t45 t46 t49 t50 t6
## 0.000872 0.19456 0.0002 0.000589 0.000634 0.26076 0.26023 0.002561 ....
##
## lambda penalty term: 0.01
## log-likelihood: -75.741123
## AIC: 351.482245
##
## R thinks it has found a solution.
plot(fit.ml2,ftype="off",digits=4,mar=c(1.1,1.1,4.1,1.1))
title(main=expression(paste("c) Estimated rates (",
sigma[BM]^2,"), for ",lambda," = 0.01")),adj=0.1)
Indeed, this is exactly what we see as it's pretty clear that σ2 is changing much more abruptly from edge to edge in this plot compared to the one from before.
Lastly, we can do the same thing again – but this time with a much higher value of λ, say, 100,000. This should have the effect of eliminating variation in the Brownian rate among edges because of the very large penalty term we've created for rate variation.
Let's see:
fit.ml3<-multirateBM(tree,x,lambda=100000)
## Beginning optimization....
## Optimization iteration 1. Using "Nelder-Mead" optimization method.
## Optimization iteration 2. Using "BFGS" optimization method.
## Done optimization.
print(fit.ml3,printlen=Ntip(tree)+tree$Nnode)
## Multi-rate Brownian model using multirateBM.
##
## Fitted rates:
## t35 t36 t4 t45 t46 t49 t50 t6 t30 t31 t26
## 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505
## t27 t8 t32 t33 t34 t14 t15 t12 t13 t18 t22
## 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505
## t23 t39 t41 t42 t3 t2 t10 t11 t5 t16 t17
## 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505
## t7 t20 t21 t40 t43 t44 t29 t28 t37 t38 t24
## 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505
## t25 t9 t19 t47 t48 t1 51 52 53 54 55
## 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505
## 56 57 58 59 60 61 62 63 64 65 66
## 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505
## 67 68 69 70 71 72 73 74 75 76 77
## 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505
## 78 79 80 81 82 83 84 85 86 87 88
## 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505
## 89 90 91 92 93 94 95 96 97 98 99
## 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505
##
## lambda penalty term: 100000
## log-likelihood: -136.439244
## AIC: 472.878488
##
## R thinks it has found a solution.
Indeed all the rates are now more or less the same, and (not coincidentally) they correspond to our ML rate under a constant-rate Brownian model, just as you'd expect.
geiger::fitContinuous(tree,x)
## GEIGER-fitted comparative model of continuous data
## fitted 'BM' model parameters:
## sigsq = 16.505384
## z0 = -1.119807
##
## model summary:
## log-likelihood = -136.439253
## AIC = 276.878506
## AICc = 277.133825
## free parameters = 2
##
## Convergence diagnostics:
## optimization iterations = 100
## failed iterations = 0
## number of iterations with same best fit = 100
## frequency of best fit = 1.00
##
## object summary:
## 'lik' -- likelihood function
## 'bnd' -- bounds for likelihood search
## 'res' -- optimization iteration summary
## 'opt' -- maximum likelihood parameter estimates
Neat.
I think it's probably fairly obvious that if λ of the penalty term is low, then the rates can just take any value that they like; if λ is high, then they're all the same; and if λ is just right (i.e., somewhere in the middle) then the rates, σ2, will evolve according to a Brownian process. What value of λ this will be, though, should depend on the scale of our data and rates!