Monday, June 1, 2026

Computing the likelihood of a simple OU model using the discrete approximation

With various co-authors I recently published a bioRxiv pre-print entitled “Unlocking a flexible set of phylogenetic models for discrete and continuous trait evolution using discretized stochastic diffusion”.

In this article, we describe how a powerful idea that was first introduced to phylogenetic comparative methods by Boucher & Démery (2016) – that is, exploiting the equivalence of continuous-space diffusion & discrete-space CTMC on a lattice in the limit as the number of steps in the chain goes to \(\infty\) to compute likelihoods for otherwise intractable phylogenetic models – has a wide variety of applications outside those envisioned in prior studies (Boucher & Démery 2016; Boucher et al. 2018).

All of these applications exploit the equivalence (in the limit as the interval or bin width \(\delta \rightarrow 0\)) of a nearest-neighbor CTMC and continuous stochastic diffusion where \(q_{c} = \sigma^2/(2\delta^2)\) gives the instantaneous transition rate between adjacent bins, in which \(\sigma^2\) is the stochastic diffusion (Brownian) rate and \(\delta\) is the bin width. The cool thing about this is that we compute an approximate probability density of continuous data by first discretizing our data, computing the probability under a nearest-neighbor CTMC with \(q_{c} = \sigma^2/(2\delta^2)\) via standard pruning, and then back-transforming the probability to a density by dividing by \(\delta^N\) for \(N\) tips. In our pre-print, we show how this technique can be used to fit the multi-state threshold model, a new “semi-threshold” model, a discrete-character-dependent continuous trait model, and others. Fundamentally, all rely on the principle that if a continuous vector \(\mathbf{x}\) is discretized as \(\mathbf{x'}\), given bin width \(\delta\), then the probability of \(\mathbf{x'}\) evolving as a nearest neighbor CTMC with \(q_{c} = \sigma^2/(2\delta^2)\) between neighbor, for diffusion rate of \(\mathbf{x}\) \(\sigma^2\) is \(P(\mathbf{x'}) = D(\mathbf{x}) \times \delta^N\), in which \(D(\mathbf{x})\) is the density of \(\mathbf{x}\).

To exemplify exactly this sort of calculation on a tree, let’s start by loading phytools.

library(phytools)

Then we can proceed to simulate a tree and data vector \(\mathbf{x}\) to work with as follows.

N<-12
tree<-pbtree(n=N,tip.label=LETTERS[N:1],scale=1)
x<-fastBM(tree,a=0,sig2=0.5,alpha=1,theta=0)
plotTree(tree,ylim=c(0,1.2*max(nodeHeights(tree))),
  direction="upwards",ftype="off")
tiplabels(round(x,3),srt=90,frame="n",adj=-0.25)

plot of chunk unnamed-chunk-325

To compute the exact probability (density) of these data under (say) a hypothesis of \(x_0 = 0\) and \(\sigma^2 = 0.5\) and evolution via simple Brownian motion, we start by obtaining their covariance structure as follows.

x0<-0
sigsq<-0.5
V<-sigsq*vcv(tree)
print(V,digits=4)
##        L      K      J      I      H      G      F      E      D      C      B      A
## L 0.5000 0.4168 0.4168 0.1581 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## K 0.4168 0.5000 0.4775 0.1581 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## J 0.4168 0.4775 0.5000 0.1581 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## I 0.1581 0.1581 0.1581 0.5000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## H 0.0000 0.0000 0.0000 0.0000 0.5000 0.4931 0.2216 0.2216 0.2065 0.2065 0.2065 0.2065
## G 0.0000 0.0000 0.0000 0.0000 0.4931 0.5000 0.2216 0.2216 0.2065 0.2065 0.2065 0.2065
## F 0.0000 0.0000 0.0000 0.0000 0.2216 0.2216 0.5000 0.3534 0.2065 0.2065 0.2065 0.2065
## E 0.0000 0.0000 0.0000 0.0000 0.2216 0.2216 0.3534 0.5000 0.2065 0.2065 0.2065 0.2065
## D 0.0000 0.0000 0.0000 0.0000 0.2065 0.2065 0.2065 0.2065 0.5000 0.4876 0.3087 0.3087
## C 0.0000 0.0000 0.0000 0.0000 0.2065 0.2065 0.2065 0.2065 0.4876 0.5000 0.3087 0.3087
## B 0.0000 0.0000 0.0000 0.0000 0.2065 0.2065 0.2065 0.2065 0.3087 0.3087 0.5000 0.4619
## A 0.0000 0.0000 0.0000 0.0000 0.2065 0.2065 0.2065 0.2065 0.3087 0.3087 0.4619 0.5000

Then we can evaluate the density on a log-scale using mvtnorm::dmvnorm as so.

mvtnorm::dmvnorm(x,mean=rep(x0,Ntip(tree)),
  sigma=V,log=TRUE)
## [1] -2.797184

To do the equivalent calculation assuming a nearest-neighbor CTMC, we must first select a number of links in the chain. The higher, the more accurate our approximation. Here, I’ll choose 200.

## number of levels for discretization
n<-200

Next, I set upper & lower bounds well outside of our trait range.

## upper & lower bounds
xlims<-phytools:::expand.range(x)

Then, I find all the bin midpoints given my n and xlims.

## bin midpoints
xi<-seq(xlims[1],xlims[2],length.out=n)

And we can proceed to calculate \(\delta\), which we’ll need later.

## delta
delta<-xi[2]-xi[1]

We needn’t, but I intend to use a phytools internal helper function (phytools:::to_binned) to transform my continuous trait into a binned, discretized trait. For that, we need to calculate the upper & lower bounds of each bin, rather than just their midpoints.

## actual bins
bins<-cbind(xi-delta/2,xi+delta/2)
## bin-discretized data
X<-phytools:::to_binned(x,bins)

Finally, our likelihood function for the discretized diffusion approximation. (Note that this is not super clean. I’m scoping delta, n, and xi from my Global Environment. Still, this works in R.)

## likelihood function
lik.bm<-function(sigma2,x0,X,phy){
  q.c<-sigma2/(2*delta^2)
  Q<-matrix(0,n,n)
  for(i in 2:(n-1)) Q[i,i+1]<-Q[i,i-1]<-q.c
  diag(Q)<--rowSums(Q)
  X0<-rep(0,length(xi))
  X0[which.min(abs(x0-xi))]<-1
  fit<-fitMk(phy,X,fixedQ=Q,pi=X0)
  obj<-logLik(fit)-Ntip(phy)*log(delta)
  class(obj)<-"logLik"
  attr(obj,"df")<-3
  obj
}

If we evaluate it, we should find that it gives us a highly similar value of the density to what we calculated early using dmvnorm (-2.797184, in that case – see above).

lik.bm(sigma2=0.5,x0=0,X=X,phy=tree)
## 'log Lik.' -2.796157 (df=3)

Yup. I’d say that’s close. Fiddle with n and you should find it gets even closer!

Ornstein-Uhlenbeck is a stochastic process model where \(dX_t = \sigma dW_t+\alpha(\theta - X_t)\). The first part of this differential expression (\(\sigma dW_t\)) is just standard Brownian motion, whereas the second part (\(\alpha(\theta - X_t)\)) is deterministic.

To translate this to our nearest-neighbor CTMC, therefore, we will need a different upward & downward rate for our process, depending on which side of \(\theta\) we find ourselves. Specifically, if we’re below \(\theta\), then \(q_+ = \sigma^2/(2\delta^2) + \alpha(\theta-x)/\delta\) and \(q_- = \sigma^2/(2\delta^2)\), whereas if we’re above \(\theta\) then \(q_+ = \sigma^2/(2\delta^2)\) and \(q_- = \sigma^2/(2\delta^2)-\alpha(\theta-x)/\delta\), where \(q_+\) and \(q_-\) are the upwards & downwards rates in our nearest-neighbor CTMC, respectively. If this isn’t intuitive, remember that OU is a rubberband model, and so if we’re below \(\theta\) then our “energy” upwards is just the sum of the stochastic component and the rubber band, while the energy downwards is just the stochastic part of our model; whereas, whenever we’re above \(\theta\) precisely the opposite is true.

(Don’t worry. I’m not the one that figured out these exact maths! It’s in the physical sciences literature already.)

OK, so let’s try to set it up, much as we did before.

lik.ou<-function(theta,alpha,sigma2,X,phy){
  mu<-alpha*(theta-xi)
  q.up<-sigma2/(2*delta^2)+pmax(mu,0)/delta
  q.down<-sigma2/(2*delta^2)+pmax(-mu,0)/delta
  Q<-matrix(0,n,n)
  for(i in 2:(n-1)){
    Q[i,i+1]<-q.up[i]
    Q[i,i-1]<-q.down[i]
  }
  diag(Q)<--rowSums(Q)
  X0<-rep(0,length(xi))
  X0[which.min(abs(theta-xi))]<-1
  fit<-fitMk(phy,X,fixedQ=Q,pi=X0)
  obj<-logLik(fit)-Ntip(phy)*log(delta)
  class(obj)<-"logLik"
  attr(obj,"df")<-3
  obj
}

Now, we happen to know that these data were actually simulated under an OU model with \(\sigma^2 = 0.5\), \(\theta = 0\), and \(\alpha = 1.0\). Let’s plug these values into our CTMC likelihood function and see what we get.

lik.ou(theta=0,alpha=1,sigma2=0.5,X,tree)
## 'log Lik.' -1.48367 (df=3)

OK, so how close is this to the true density? Well, much like BM, OU is a Gaussian process and the tip values will thus be MVN. We just need a way to get the correct covariance matrix for a particular \(\sigma^2\) and \(\alpha\), which is straightforward using phytools::vcvPhylo as follows.

x0<-0
sigsq<-0.5
V<-sigsq*vcvPhylo(tree,model="OU",alpha=1,
  anc.nodes=FALSE)
print(V,digits=4)
##         L       K       J       I       H       G       F       E       D       C       B       A
## L 0.21617 0.14539 0.14539 0.02985 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
## K 0.14539 0.21617 0.19466 0.02985 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
## J 0.14539 0.19466 0.21617 0.02985 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
## I 0.02985 0.02985 0.02985 0.21617 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
## H 0.00000 0.00000 0.00000 0.00000 0.21617 0.20931 0.04827 0.04827 0.04345 0.04345 0.04345 0.04345
## G 0.00000 0.00000 0.00000 0.00000 0.20931 0.21617 0.04827 0.04827 0.04345 0.04345 0.04345 0.04345
## F 0.00000 0.00000 0.00000 0.00000 0.04827 0.04827 0.21617 0.10527 0.04345 0.04345 0.04345 0.04345
## E 0.00000 0.00000 0.00000 0.00000 0.04827 0.04827 0.10527 0.21617 0.04345 0.04345 0.04345 0.04345
## D 0.00000 0.00000 0.00000 0.00000 0.04345 0.04345 0.04345 0.04345 0.21617 0.20408 0.08246 0.08246
## C 0.00000 0.00000 0.00000 0.00000 0.04345 0.04345 0.04345 0.04345 0.20408 0.21617 0.08246 0.08246
## B 0.00000 0.00000 0.00000 0.00000 0.04345 0.04345 0.04345 0.04345 0.08246 0.08246 0.21617 0.18080
## A 0.00000 0.00000 0.00000 0.00000 0.04345 0.04345 0.04345 0.04345 0.08246 0.08246 0.18080 0.21617

Ready?

mvtnorm::dmvnorm(x,mean=rep(x0,Ntip(tree)),
  sigma=V,log=TRUE)
## [1] -1.489203

OK, that looks pretty darn similar!

Now, as a last step, & just for fun, let’s contemplate what it might look like to optimize our OU model using the discretized approximation, rather than just feeding it our generating parameter values.

First, let’s optimize in our original space to maximize the probability of obtaining the data that we have, in fact, in our hands (that is, maximize the likelihood).

## do this on a log scale
likc<-function(par,x,phy){
  -mvtnorm::dmvnorm(x,mean=rep(par[1],Ntip(phy)),
    sigma=exp(par[3])*vcvPhylo(phy,model="OU",
      alpha=exp(par[2]),anc.nodes=FALSE),
    log=TRUE)
}
## run through optim
fit<-optim(
  par=c(0,log(1),log(0.5)),
  fn=likc,
  x=x,phy=tree)
## likelihood
-fit$value
## [1] -1.019458
## parameter estimates
c(fit$par[1],exp(fit$par[2:3]))
## [1] -0.07922464  1.54252848  0.41766567

These should be similar to what we might’ve obtained using geiger::fitContinuous, of course. (Thankfully, they are.)

ou_fit<-geiger::fitContinuous(tree,x,model="OU")
ou_fit
## GEIGER-fitted comparative model of continuous data
##  fitted 'OU' model parameters:
## 	alpha = 1.542565
## 	sigsq = 0.417675
## 	z0 = -0.079225
## 
##  model summary:
## 	log-likelihood = -1.019458
## 	AIC = 8.038916
## 	AICc = 11.038916
## 	free parameters = 3
## 
## Convergence diagnostics:
## 	optimization iterations = 100
## 	failed iterations = 0
## 	number of iterations with same best fit = 60
## 	frequency of best fit = 0.600
## 
##  object summary:
## 	'lik' -- likelihood function
## 	'bnd' -- bounds for likelihood search
## 	'res' -- optimization iteration summary
## 	'opt' -- maximum likelihood parameter estimates

Now, let’s try to do the same thing using the discrete approximation.

likd<-function(par,X,phy){
  -as.numeric(lik.ou(par[1],exp(par[2]),exp(par[3]),
    X,phy))
}
fit<-optim(
  par=c(0,log(1),log(0.5)),
  fn=likd,
  X=X,
  phy=tree,
  method="Nelder-Mead")
## likelihood
-fit$value
## [1] -0.9733699
## parameter estimates
c(fit$par[1],exp(fit$par[2:3]))
## [1] -0.06975488  1.51608457  0.40323807

Yeah. Keeping in mind that this is an approximation that only becomes exact in the limit as \(\delta\) goes to zero, that totally worked.

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.