A few days ago, I shared a new variable rate Brownian motion evolution model for continuous traits on the tree.

In tests, the method performed pretty
well, and I wrote a nice
S3 `plot`

method that helped a tweet
about the method go 'viral' (or at least as viral as such things do).

New variable-rate quantitative trait evolution model in #Rstats #phytools. Check out my blog for more details: https://t.co/tV2Xwmhf4k. pic.twitter.com/so2A28GljB

— Liam Revell (@phytools_liam) March 29, 2021

Well, as it turns out things are a bit more complicated.

In particular, the model I was using actually ends up being non-identifiable if we expect to estimate *both*
the Brownian motion rates of evolution (σ_{i}^{2}) along all the edges of the tree,
*as well as* the rate of Brownian evolution of the rate itself (let's call it
σ_{BM}^{2}).

What we *can* do very well, though, is estimate the rates under different scenarios for rate heterogeneity,
if we assign a penalty term, λ, to the heterogeneity in rate among edges. This approach (of optimizing
a problem using likelihood, but in which model complexity is penalized to an extent that's determined by the
user) is called *penalized likelihood*, and it makes sense that it would work here because it is used in
other rate-smoothing methods, including for phylogenetics. In this case, the penalty function that I'm using
is the probability density of the rates σ_{i}^{2} on the tree. Not considering
the data, this function will always have a maximum value with all rates equal!

A low value of the penalty term (say, λ = 0.01) accords very little penalty to rate variation among edges. On the other hand, a higher value (like, λ = 100) allows the rate to vary little. Intermediate λ (e.g., 1) gives “equal” weight to the probability of the data under the model, and the probability of the rate variation among edges, given a Brownian evolution process for rate variation.

Updated code for the function `multirateBM`

is here.

Let's simulate some data and see how it works. In this case what I'm doing is simulating node states for the log-rate under Brownian motion, transforming the edges of my tree by the values at the subtending nodes, and then simulating data on the transformed tree.

```
library(phytools)
tree<-pbtree(n=50)
sig2<-exp(fastBM(tree,internal=TRUE))
tt<-tree
tt$edge.length<-tt$edge.length*apply(tt$edge,1,
function(e,x) mean(x[e]),x=sig2)
x<-fastBM(tt)
```

I'm actually going to “steal” the `plot`

method for my `"multirateBM"`

object class to visualize how this
rate variation is distributed under on the tree in the generating model as follows:

```
object<-list()
class(object)<-"multirateBM"
object$sig2<-sig2
object$tree<-tree
plot(object,ftype="off",digits=4,mar=c(1.1,1.1,4.1,1.1))
title(main=expression(paste("a) True rates (",
sigma[BM]^2,")")),adj=0.1)
```

Now let's fit our model – starting with a λ penalty coefficient of 1 – and see how the estimated rates compare.

```
fit.ml<-multirateBM(tree,x,lambda=1)
```

```
## Beginning optimization....
## Optimization iteration 1. Using "Nelder-Mead" optimization method.
## Optimization iteration 2. Using "BFGS" optimization method.
## Done optimization.
```

```
fit.ml
```

```
## Multi-rate Brownian model using multirateBM.
##
## Fitted rates:
## t35 t36 t4 t45 t46 t49 t50 t6
## 0.076963 0.077526 0.076633 0.059895 0.059905 0.15573 0.15573 0.075023 ....
##
## lambda penalty term: 1
## log-likelihood: -104.205263
## AIC: 408.410527
##
## R thinks it has found a solution.
```

```
plot(fit.ml,ftype="off",digits=4,mar=c(1.1,1.1,4.1,1.1))
title(main=expression(paste("b) Estimated rates (",
sigma[BM]^2,"), for ",lambda," = 1.0")),adj=0.1)
```

Since we know the true values of σ^{2} for each edge, we can easily check to see how
well-correlated our estimated values are with our known, true rates:

```
options(scipen=999)
## generating sigma^2 by edge
SIG2<-apply(tree$edge,1,function(e,x)
mean(x[e]),x=sig2)
## estimated sigma^2 by edge
est.SIG2<-apply(tree$edge,1,function(e,x)
mean(x[e]),x=fit.ml$sig2)
xylim<-range(c(SIG2,est.SIG2))
par(mar=c(5.1,5.1,4.1,1.1))
plot(SIG2,est.SIG2,pch=21,bg="grey",cex=1.5,bty="n",
xlab=expression(paste("true rates, ",sigma^2)),
ylab=expression(paste("estimates rates, ",sigma^2)),
xlim=xylim,ylim=xylim,log="xy",las=1,cex.axis=0.8)
lines(xylim,xylim,lwd=2,
col=make.transparent(palette()[4],0.5))
title(main=expression(paste(
"Simulated vs. Estimated Brownian Rates, ",
lambda," = 1.0")))
```

Now let's try again, but this time with λ = 0.01, to see what happens. We should expect our model
to exhibit *greater* variation in the Brownian rate, σ^{2}, among edges.

```
fit.ml2<-multirateBM(tree,x,lambda=0.01)
```

```
## Beginning optimization....
## Optimization iteration 1. Using "Nelder-Mead" optimization method.
## Optimization iteration 2. Using "BFGS" optimization method.
## Done optimization.
```

```
fit.ml2
```

```
## Multi-rate Brownian model using multirateBM.
##
## Fitted rates:
## t35 t36 t4 t45 t46 t49 t50 t6
## 0.000872 0.19456 0.0002 0.000589 0.000634 0.26076 0.26023 0.002561 ....
##
## lambda penalty term: 0.01
## log-likelihood: -75.741123
## AIC: 351.482245
##
## R thinks it has found a solution.
```

```
plot(fit.ml2,ftype="off",digits=4,mar=c(1.1,1.1,4.1,1.1))
title(main=expression(paste("c) Estimated rates (",
sigma[BM]^2,"), for ",lambda," = 0.01")),adj=0.1)
```

Indeed, this is exactly what we see as it's pretty clear that σ^{2} is changing much more
abruptly from edge to edge in this plot compared to the one from before.

Lastly, we can do the same thing again – but this time with a much *higher* value of λ, say, 100,000.
This should have the effect of *eliminating* variation in the Brownian rate among edges because of the
very large penalty term we've created for rate variation.

Let's see:

```
fit.ml3<-multirateBM(tree,x,lambda=100000)
```

```
## Beginning optimization....
## Optimization iteration 1. Using "Nelder-Mead" optimization method.
## Optimization iteration 2. Using "BFGS" optimization method.
## Done optimization.
```

```
print(fit.ml3,printlen=Ntip(tree)+tree$Nnode)
```

```
## Multi-rate Brownian model using multirateBM.
##
## Fitted rates:
## t35 t36 t4 t45 t46 t49 t50 t6 t30 t31 t26
## 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505
## t27 t8 t32 t33 t34 t14 t15 t12 t13 t18 t22
## 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505
## t23 t39 t41 t42 t3 t2 t10 t11 t5 t16 t17
## 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505
## t7 t20 t21 t40 t43 t44 t29 t28 t37 t38 t24
## 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505
## t25 t9 t19 t47 t48 t1 51 52 53 54 55
## 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505
## 56 57 58 59 60 61 62 63 64 65 66
## 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505
## 67 68 69 70 71 72 73 74 75 76 77
## 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505
## 78 79 80 81 82 83 84 85 86 87 88
## 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505
## 89 90 91 92 93 94 95 96 97 98 99
## 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505 16.505
##
## lambda penalty term: 100000
## log-likelihood: -136.439244
## AIC: 472.878488
##
## R thinks it has found a solution.
```

Indeed all the rates are now more or less the same, and (not coincidentally) they correspond to our ML rate under a constant-rate Brownian model, just as you'd expect.

```
geiger::fitContinuous(tree,x)
```

```
## GEIGER-fitted comparative model of continuous data
## fitted 'BM' model parameters:
## sigsq = 16.505384
## z0 = -1.119807
##
## model summary:
## log-likelihood = -136.439253
## AIC = 276.878506
## AICc = 277.133825
## free parameters = 2
##
## Convergence diagnostics:
## optimization iterations = 100
## failed iterations = 0
## number of iterations with same best fit = 100
## frequency of best fit = 1.00
##
## object summary:
## 'lik' -- likelihood function
## 'bnd' -- bounds for likelihood search
## 'res' -- optimization iteration summary
## 'opt' -- maximum likelihood parameter estimates
```

Neat.

I think it's probably fairly obvious that if λ of the penalty term is low, then the rates can just
take any value that they like; if λ is high, then they're all the same; and if λ is
just right (i.e., somewhere in the middle) then
the rates, σ^{2}, will evolve according to a Brownian process. What value of λ this
will be, though, should depend on the scale of our data and rates!