Saturday, June 13, 2026

A discrete-character-dependent multi-θ Ornstein-Uhlenbeck model in phytools

I’m still working out the bugs, but I just added a new function, fitmultiOU, to phytools 2.6-1 on GitHub for fitting a discrete-character-dependent multi-regime Ornstein-Uhlenbeck using the discrete diffusion approximation of Boucher & Démery (2016) – that we also describe in great detail in a new bioRxiv pre-print. I encourage anyone interested in this powerful tool to go & check out our pre-print!

fitmultiOU builds off of the post I created a couple of weeks ago on this blog showing how to calculate the likelihood under a simple, single- \(\theta\) OU model on a tree using the discrete approximation. fitmultiOU just takes this a step further in exactly the same way as fitmultiBM and fitmultiTrend.

As previously noted, I’m definitely still working out the kinks. Indeed, I wouldn’t trust it farther than I can throw it. Hopefully that will change soon. Nonetheless, preliminary tests are promising.

To see this, let’s start by simulating a discrete character on a tree of 1000 taxa that assumes two different levels, a and b, as follows. (I’m choosing to simulate on a relatively large tree on purpose just to give us the best chance of recovering our generating parameter values if our method works.)

library(phytools)
## Loading required package: ape
## Loading required package: maps
N<-1000
tree<-pbtree(n=N,scale=10)
tree
## 
## Phylogenetic tree with 1000 tips and 999 internal nodes.
## 
## Tip labels:
##   t358, t660, t787, t788, t659, t210, ...
## 
## Rooted; includes branch length(s).
q<-0.2
Q<-matrix(c(-q,q,q,-q),2,2,
  dimnames=list(letters[1:2],letters[1:2]))
Q
##      a    b
## a -0.2  0.2
## b  0.2 -0.2
sim_tree<-sim.history(tree,Q,anc="a")
## Done simulation(s).
cols<-setNames(hcl.colors(n=2),letters[1:2])
plot(sim_tree,cols,ftype="off",lwd=1,
  direction="upwards")
par(lend=1)
legend("bottomleft",letters[1:2],lwd=2,
  col=hcl.colors(n=2),cex=0.6,bty="n")

plot of chunk unnamed-chunk-7

To be clear, we've generated this trait history for simulation -- but for estimation, we'll use only the tip states & the tree.

Next, let’s choose our different simulation parameters. So far, I’m only allowing \(\theta\), the position of th “optimum” in an OU model, to vary as a function of the discrete trait. Nonetheless, our simulator, multiOU, requires that we supply vectors for \(\alpha\), \(\sigma^2\), and \(\theta\), even if we intend to vary only \(\theta\) by discrete character regime.

I’m going to set \(\alpha = 0.3\), which corresponds to a phylogenetic “half-life” of \(\log(2) / \alpha \approx 2.3\) (our total tree height is 10), \(\sigma^2 = 0.1\), and \(\theta = [-0.5, 1.5]\) for the two different regimes.

alpha<-setNames(rep(0.3,2),letters[1:2])
alpha
##   a   b 
## 0.3 0.3
sig2<-setNames(rep(0.1,2),letters[1:2])
sig2
##   a   b 
## 0.1 0.1
theta<-setNames(c(-0.5,1.5),letters[1:2])
theta
##    a    b 
## -0.5  1.5

Now we’re ready for our simulation.

multiOU allows us to separately specify the root state and the root regime, but because I’m not sure these are separately estimable, I’m going to set them to be equal to one another. (That is, I’ll assume that the initial condition of the simulation for the continuous trait is on the “optimum” as prescribed by the discrete character condition at the root.)

## get the root state
root_state<-getStates(sim_tree,"nodes")[1]
root_state
## 1001 
##  "a"

Now I can simulate my continuous trait, x.

x<-multiOU(sim_tree,alpha,sig2,theta,a0=theta[root_state])
head(x)
##     t358     t660     t787     t788     t659     t210 
## 1.336066 1.620262 1.661155 1.800279 1.728212 1.764376

Our analysis is going to use as input the tip states for our discrete trait, \(y\) – not, importantly, the regimes we generated using sim.history, as mentioned previously.

Since only the object sim_tree has this information, we should pull it off using phytools::getStates.

y<-as.factor(getStates(sim_tree,"tips"))
head(y)
## t358 t660 t787 t788 t659 t210 
##    b    b    b    b    b    b 
## Levels: a b

As a final step, I’m going to try to identify reasonable starting values for our optimization process. This will just make finding the MLEs a bit more efficient.

init<-setNames(
  c(
    mean(x[y==levels(y)[1]]),mean(x[y==levels(y)[2]]),
    log(2)/max(nodeHeights(tree)),
    var(x)/max(nodeHeights(tree)),
    fitMk(tree,y)$rates),
  c("theta[a]","theta[b]","alpha","sigsq","q[1]"))
init
##    theta[a]    theta[b]       alpha       sigsq        q[1] 
## -0.01785477  0.80715486  0.06931472  0.05112673  0.23197867

Now let’s try to optimize our model using fitmultiOU. I’ll turn on trace to get periodic progress reports about our optimization process. As a word of warning, this takes a while. (The following analysis took several hours to run on my laptop.)

fit_mou<-fitmultiOU(tree,x,y,model="ER",levs=100,
  parallel=TRUE,root="mle",init=init,trace=1)
## iter	the[a]	the[b]	alpha	sigsq	q[1]	log(L)
## 0	-0.0179	0.8072	0.0693	0.0511	0.2320	-932.8691 
## 100	-0.8977	1.1648	0.3001	0.0832	0.2364	-724.9863 
## 200	-0.6454	1.2625	0.3260	0.0865	0.2239	-724.2291 
## 300	-0.4044	1.3876	0.3480	0.0904	0.2229	-723.3257 
## 400	-0.4038	1.4186	0.3340	0.0856	0.2210	-722.8809 
## 500	-0.3719	1.4540	0.3327	0.0834	0.2263	-722.6922 
## 600	-0.3316	1.6840	0.3190	0.0774	0.2268	-721.7071 
## 700	-0.3316	1.6656	0.3145	0.0814	0.2383	-720.9715 
## 800	-0.3259	1.6652	0.3145	0.0817	0.2379	-720.9686 
## 874	-0.3266	1.6667	0.3141	0.0816	0.2378	-720.9684 
## Done optimizing.

Here's our fitted model.

fit_mou
## Object of class "fitmultiOU" based on
##     a discretization with k = 100 levels.
## 
## Fitted multi-theta OU model parameters:
##  levels: [ a, b ]
##   theta: [ -0.3266, 1.6667 ]
##   alpha: 0.3141 
##   sigsq: 0.0816 
## 
## Estimated Q matrix:
##           a         b
## a -0.237812  0.237812
## b  0.237812 -0.237812
## 
## Log-likelihood: -720.9684 
## 
## R thinks it has found the ML solution.

Well, that’s pretty uncanny as a first try. Again, fitmultiOU is a prototype and is shared only with a warning to use at one’s own risk. Nonetheless, I think it’s right, so that’s pretty cool.

For when I tweet about this, here’s a visualization of multi-regime OU (in general, not this specific case). The code for this visualization comes from a different recent post to this blog.

phy<-pbtree(n=40,scale=10)
tt<-map.to.singleton(make.era.map(phy,
  limits=seq(0,10,length.out=1001)))
Q<-matrix(c(-0.2,0.2,0.2,-0.2),2,2,
  dimnames=list(letters[1:2],letters[1:2]))
s.tt<-sim.history(tt,Q,anc="a")
## Done simulation(s).
cols<-setNames(hcl.colors(n=2),letters[1:2])
theta<-setNames(c(-0.5,1.5),letters[1:2])
sigsq<-setNames(rep(0.1,2),letters[1:2])
alpha<-setNames(rep(0.3,2),letters[1:2])
xx<-multiOU(s.tt,alpha=alpha,sig2=rep(sigsq,2),
  theta=theta,a0=theta["a"],internal=TRUE)
layout(matrix(c(1,2),2,1),heights=c(0.4,0.6))
plot(s.tt,cols,ftype="off",mar=c(1.1,4.1,2.1,2.1),
  xlim=c(0,11))
mtext("a)",adj=0,line=0)
par(mar=c(5.1,4.1,2.1,2.1))
phenogram(s.tt,xx,ftype="off",
  colors=make.transparent(cols,0.5),
  xlim=c(0,11),cex.axis=0.8,las=1)
mtext("b)",adj=0,line=1.5)
for(i in 1:length(theta)){
  stat_theta<-dnorm(seq(min(xx),max(xx),length.out=200),
    mean=theta[i],sd=sqrt(sigsq[i]/(2*alpha[i])))
  stat_theta<-c(0,stat_theta,0)
  stat.norm<-stat_theta/max(stat_theta)*1
  polygon(x=stat.norm+10,
    y=c(min(xx),seq(min(xx),max(xx),
    length.out=200),max(xx)),border=FALSE,
    col=make.transparent(cols[i],0.5))
}

plot of chunk unnamed-chunk-19

Tuesday, June 9, 2026

Visualizing a discrete-character-dependent multi-regime OU process on a tree with phytools

Working on something else, I was recently reminded of the fact that phytools includes a function to simulate a multi- \(\theta\) (i.e., multi-“regime”) Ornstein-Uhlenbeck process called multiOU.

multiOU does a continuous-time simulation, but can (optionally) output node states, so I thought it would be fun to show how the function can be used (in combination with lots of non-branching nodes in our tree) to graphically illustrate multi-optimum OU.

This doesn’t serve any special purpose outside the visual, but perhaps someone currently preparing to present at Evolution 2026 in Cleveland in a couple of weeks will find this useful for their slideshow!

## load phytools
library(phytools)

We can start by simulating a tree. Here I’m going to scale the total length of this tree to 100 time units.

tree<-pbtree(n=40,scale=100)

Next I’d like to add a bunch of non-branching nodes along the length of my tree from the root to every tip. I’m going to evenly space these & the way I’m going to do it is by wrapping a function call of phytools::make.era.map with another map.to.singleton. This first generates “era” mapped regimes on the tree in "simmap" format and then converts these to nodes.

tt<-map.to.singleton(make.era.map(tree,
  limits=seq(0,100,length.out=1001)))

We can see that this worked as follows.

plotTree(tt,ftype="off",lwd=1)
get("last_plot.phylo",envir=.PlotPhyloEnv)->pp
points(pp$xx,pp$yy,pch=16,col=make.transparent("blue",0.2),
  cex=0.5)

plot of chunk unnamed-chunk-5

(I hope it’s evident that the blur of blue dots along each edge of the tree & non-branching nodes!)

Next we’re going to simulate our regimes on the tree as a continuous-time Markov chain using phytools::sim.history. I’m just going to simulate two regimes, a and b, with a switching rate between regimes of \(q=0.02\). Keep in mind, we should do this on our tree with unbranching nodes, tt.

## our Q matrix
Q<-matrix(c(-0.02,0.02,0.02,-0.02),2,2,
  dimnames=list(letters[1:2],letters[1:2]))
## our regime tree
s.tt<-sim.history(tt,Q,anc="a")
## Done simulation(s).

To get a sense of the history we’ve simulated, let’s plot it. This can be done in the standard way because phytools::plot.simmap has no problem with unbranching nodes.

cols<-setNames(hcl.colors(n=2),letters[1:2])
plot(s.tt,cols,ftype="off")

plot of chunk unnamed-chunk-7

So far, so good.

Now, we’ll set our simulation conditions for the multi-regime OU. The function allows us to simulate different \(\alpha\), \(\sigma^2\), and \(\theta\) for each of our regimes, but here we’ll hold the former two parameters, \(\alpha\) & \(\sigma^2\), constant between regimes and set \(\theta = [-1,1]\) as follows.

theta<-setNames(c(-1,1),letters[1:2])
sigsq<-setNames(rep(0.4,2),letters[1:2])
alpha<-setNames(rep(0.7,2),letters[1:2])
x<-multiOU(s.tt,alpha=alpha,sig2=rep(sigsq,2),
  theta=theta,a0=theta["a"],internal=TRUE)

Note that the vector, x, that I simulated contains not only the states for the tips but also the conditions of all the thousands of internal nodes of the tree – the overwhelming majority of which are unbranching.

head(x)
##          t4         t25         t26          t8          t9         t18 
## -1.47750858  0.01917008  1.05348083  1.71793727  0.71903373  1.17666809
length(x)
## [1] 14574

Now, for our plot!

For fun, I’m also going to add the stationary distribution at the tips based on our generating values of \(\theta=[-1,1]\), \(\alpha=0.7\), and \(\sigma^2=0.4\).

layout(matrix(c(1,2),2,1),heights=c(0.4,0.6))
plot(s.tt,cols,ftype="off",mar=c(1.1,4.1,2.1,2.1),
  xlim=c(0,110))
mtext("a)",adj=0,line=0)
par(mar=c(5.1,4.1,2.1,2.1))
phenogram(s.tt,x,ftype="off",
  colors=make.transparent(cols,0.5),
  xlim=c(0,110),cex.axis=0.8,las=1)
mtext("b)",adj=0,line=1.5)
for(i in 1:length(theta)){
  stat_theta<-dnorm(seq(min(x),max(x),length.out=200),
    mean=theta[i],sd=sqrt(sigsq[i]/(2*alpha[i])))
  stat.norm<-stat_theta/max(stat_theta)*10
  polygon(x=stat.norm+100,y=seq(min(x),max(x),
    length.out=200),border=FALSE,
    col=make.transparent(cols[i],0.5))
}

plot of chunk unnamed-chunk-11

That’s pretty much the idea. Cool, right?

Friday, June 5, 2026

Ancestral states under the discretized diffusion approximation using bounded_bm in phytools

As many followers & readers of this blog will undoubtedly know, I’ve been thinking & writing endlessly about what we refer to as the “discretized diffusion approximation” for computing the likelihood in continuous & discrete character models (first instroduced to phylogenetic comparative biology by Boucher & Démery 2016) for well over two years. We recently published a pre-print that I encourage anyone interested in this topic to check out!

In this post, I just wanted to briefly explore ancestral state estimation using this approach which I’ve implemented in the phytools ancr generic method.

First, let’s just see how it works in general. To do that, I’m going to simulate a tree & some data via unbounded Brownian motion under which our MLEs for ancestral states would normally be very easy to obtain (e.g., using phytools::fastAnc).

## load phytools
library(phytools)
## simulate a tree
phy<-pbtree(n=26,scale=1,tip.label=LETTERS)
phy
## 
## Phylogenetic tree with 26 tips and 25 internal nodes.
## 
## Tip labels:
##   A, B, C, D, E, F, ...
## 
## Rooted; includes branch length(s).
## simulate some trait data
x<-fastBM(phy)
x
##           A           B           C           D           E           F           G 
## -0.52238323 -1.08476410 -0.73893741 -0.75336367 -0.94397490 -0.67163255 -1.21387274 
##           H           I           J           K           L           M           N 
## -0.65653756 -1.16240551 -0.87838336 -0.93716934 -0.80657269 -0.96539003 -1.09064863 
##           O           P           Q           R           S           T           U 
## -1.80940444 -0.99874761 -2.07552774 -1.83396425 -0.77599997 -2.04509690 -0.67626465 
##           V           W           X           Y           Z 
## -0.09662133  0.57076545  0.47261265  0.26072599 -0.03490109

Next, let’s fit an unbounded Brownian model using the discrete approximation. This can be done in phytools with the function bounded_bm just by allowing the limits or bounds of the trait to fall well outside the observed trait distribution. (This is technically a “bounded” model, but since almost no probability density hits the bounds it is effectively unbounded BM.)

## fit unbounded model
bm<-bounded_bm(phy,x,root="mle",levs=400)
bm
## Object of class "bounded_bm" based on
##    a discretization with k = 400 levels.
## 
## Unwrapped (i.e., bounded) model
## 
## Set or estimated bounds: [ -3.398674 , 1.893912 ]
## 
## Fitted model parameters:
##   sigsq: 0.639077 
##      x0: -0.69284 
## 
## Log-likelihood: -17.871792 
## 
## R thinks it has found the ML solution.

If we compare this to the fit obtained using the continuous likelihood function (rather than our discretized approximation) we should find that the parameter estimates & log-likelihood (very, very nearly) match. This similar will only increase with levs (or equivalently, as \(\delta\), the bin width of our discretization, decreases towards zero).

geiger::fitContinuous(phy,x,model="BM")
## GEIGER-fitted comparative model of continuous data
##  fitted 'BM' model parameters:
## 	sigsq = 0.642594
## 	z0 = -0.692552
## 
##  model summary:
## 	log-likelihood = -17.940158
## 	AIC = 39.880316
## 	AICc = 40.402055
## 	free parameters = 2
## 
## Convergence diagnostics:
## 	optimization iterations = 100
## 	failed iterations = 0
## 	number of iterations with same best fit = 100
## 	frequency of best fit = 1.000
## 
##  object summary:
## 	'lik' -- likelihood function
## 	'bnd' -- bounds for likelihood search
## 	'res' -- optimization iteration summary
## 	'opt' -- maximum likelihood parameter estimates

This is old news.

What you might not have realized is that our model object from bounded_bm can be passed to the generic method ancr and we ought to get back out MLE ancestral states – with a small catch that I’ll get to!

This is what I mean….

## ancestral state estimates from the discrete approximation
anc.d<-ancr(bm)
print(anc.d,printlen=5)
## Ancestral character estimates from "bounded_bm" object:
##        27        28        29        30        31     
##  -0.69284 -0.909587 -0.809253 -0.858961 -0.854084 ....
## 
## Lower & upper 95% CIs:
##        lower     upper
## 27  -0.69284  -0.69284
## 28 -1.394107  -0.42821
## 29 -1.208867 -0.414979
## 30 -1.169172 -0.547293
## 31 -1.155941 -0.547293
##         ....      ....
## ancestral states using continuous likelihood function
anc.c<-fastAnc(phy,x,CI=TRUE)
print(anc.c,printlen=5)
## Ancestral character estimates using fastAnc:
##         27       28        29        30       31     
##  -0.692552 -0.90973 -0.810438 -0.860237 -0.85374 ....
## 
## Lower & upper 95% CIs:
##        lower     upper
## 27 -1.378038 -0.007065
## 28 -1.438079  -0.38138
## 29 -1.220804 -0.400071
## 30 -1.174338 -0.546135
## 31 -1.167611 -0.539869
##         ....      ....

Let’s plot one set of point estimates against the other.

par(mar=c(5.1,4.1,1.1,1.1))
plot(anc.c$ace,anc.d$ace,bty="n",pch=21,
  bg=make.transparent("blue",0.5),
  xlab="estimates from fastAnc",
  ylab="estimates from bounded_bm > ancr",
  las=1,cex.axis=0.8)
grid()
abline(a=0,b=1,lty="dashed")

plot of chunk unnamed-chunk-9

I don’t need any fancy statistics to see that I’m getting the right estimates here!

There is one teeny, tiny problem though.

It becomes evident when we closely inspect our model object from ancr as follows.

print(anc.d,prinlen=5)
## Ancestral character estimates from "bounded_bm" object:
##        27        28        29        30        31        32     
##  -0.69284 -0.909587 -0.809253 -0.858961 -0.854084 -0.884419 ....
## 
## Lower & upper 95% CIs:
##        lower     upper
## 27  -0.69284  -0.69284
## 28 -1.394107  -0.42821
## 29 -1.208867 -0.414979
## 30 -1.169172 -0.547293
## 31 -1.155941 -0.547293
## 32 -1.169172 -0.600219
##         ....      ....

Here we see that the upper & lower bounds of the confidence intervals on the global root state are both equal to the point estimate which is clearly wrong. Indeed, this is typically the most uncertain internal node state, rather than the least!

This happens because to compute the likelihood under our discretized diffusion approximation and have it match the continuous density, when we get to the root state we have to fix it to its MLE.

This has the effect of meaning that when we go ahead & undertake ASR using this fitted model, instead of just conditioning on the MLE of \(\sigma^2\), we’re also conditioning on the MLE of \(x_0\), the root state. This is not what we want to be doing at all!

Well, in this case it doesn’t really matter because we can solve the continuous density anyway; however, for other models where that’s intractable, what I might tentatively recommend instead is changing the root prior in estimation if we intend to use our fitted model in ASR.

Here’s what that would look like for our twenty-six taxon phylogeny of earlier.

bm_flatroot<-bounded_bm(phy,x,levs=400,root="flat")
bm_flatroot
## Object of class "bounded_bm" based on
##    a discretization with k = 400 levels.
## 
## Unwrapped (i.e., bounded) model
## 
## Set or estimated bounds: [ -3.398674 , 1.893912 ]
## 
## Fitted model parameters:
##   sigsq: 0.664673 
##      x0: -0.692566 
## 
## Log-likelihood: -19.682352 
## 
## R thinks it has found the ML solution.
anc.d_flatroot<-ancr(bm_flatroot)
print(anc.d_flatroot,printlen=5)
## Ancestral character estimates from "bounded_bm" object:
##         27        28        29        30        31     
##  -0.692566 -0.909521 -0.809241 -0.858956 -0.854082 ....
## 
## Lower & upper 95% CIs:
##        lower     upper
## 27 -1.380876 -0.004803
## 28 -1.433802 -0.388516
## 29 -1.222098 -0.401747
## 30 -1.169172 -0.547293
## 31 -1.169172 -0.547293
##         ....      ....

Let’s once again compare these states (that is, the point estimates at nodes) to the continuous likelihood function values.

par(mar=c(5.1,4.1,1.1,1.1))
plot(anc.c$ace,anc.d_flatroot$ace,bty="n",pch=21,
  bg=make.transparent("blue",0.5),
  xlab="estimates from fastAnc",
  ylab="estimates from bounded_bm > ancr",
  las=1,cex.axis=0.8)
grid()
abline(a=0,b=1,lty="dashed")

plot of chunk unnamed-chunk-13

Once again, it’s evident that we have the right ML states.

Now, for fun, let’s compare the lower & upper confidence bounds as well….

par(mfrow=c(1,2),mar=c(5.1,4.1,2.1,1.1))
plot(anc.c$CI95[,1],anc.d_flatroot$CI95[,1],bty="n",pch=21,
  bg=make.transparent("blue",0.5),
  xlab="lower CI bound from fastAnc",
  ylab="lower CI bound from bounded_bm > ancr",
  las=1,cex.axis=0.8)
grid()
abline(a=0,b=1,lty="dashed")
mtext("a) lower CI bound",adj=0)
plot(anc.c$CI95[,2],anc.d_flatroot$CI95[,2],bty="n",pch=21,
  bg=make.transparent("blue",0.5),
  xlab="upper CI bound from fastAnc",
  ylab="upper CI bound from bounded_bm > ancr",
  las=1,cex.axis=0.8)
grid()
abline(a=0,b=1,lty="dashed")
mtext("b) upper CI bound",adj=0)

plot of chunk unnamed-chunk-14 Dang. I didn’t expect that to work this well!

Nice.

Of course, this actually gets interesting when we start to introduce models (such as bounded BM) without tractable solutions for the likelihood. Perhaps I’ll demo that in another post!

Generic plot method for the "fastAnc" object class in phytools

Sometimes I forget about all the hidden features of phytools.

One that I’d nearly forgotten about today is an S3 generic plot method that I wrote for the "fastAnc" continuous character ancestral state reconstruction method object class that I wrote just a few months ago.

Here’s a very simple demo.

## load phytools
library(phytools)
## load a tree & some data
data(primate.data)
data(primate.tree)
head(primate.data)
##                                  Group Skull_length Optic_foramen_area Orbit_area
## Allenopithecus_nigroviridis Anthropoid         98.5                7.0      298.7
## Alouatta_palliata           Anthropoid        109.8                5.3      382.3
## Alouatta_seniculus          Anthropoid        108.0                8.0      359.4
## Aotus_trivirgatus           Anthropoid         60.5                3.1      297.4
## Arctocebus_aureus            Prosimian         49.5                1.2      134.8
## Arctocebus_calabarensis      Prosimian         53.8                1.6      156.6
##                             Activity_pattern Activity_pattern_code
## Allenopithecus_nigroviridis          Diurnal                     0
## Alouatta_palliata                    Diurnal                     0
## Alouatta_seniculus                   Diurnal                     0
## Aotus_trivirgatus                  Nocturnal                     2
## Arctocebus_aureus                  Nocturnal                     2
## Arctocebus_calabarensis            Nocturnal                     2

Let’s analyze primate skull length, on a log-scale.

lnSkullLength<-setNames(log(primate.data$Skull_length),
  rownames(primate.data))

Here’s our ancestral state reconstruction under BM.

primate_anc<-fastAnc(primate.tree,lnSkullLength,CI=TRUE)
print(primate_anc,printlen=6)
## Ancestral character estimates using fastAnc:
##        91       92       93       94       95       96     
##  4.175934 4.186044 4.431545 4.724982 4.681729 4.665113 ....
## 
## Lower & upper 95% CIs:
##       lower    upper
## 91 3.814174 4.537694
## 92 3.828175 4.543914
## 93 4.116413 4.746677
## 94 4.477337 4.972627
## 95 4.483873 4.879584
## 96 4.494131 4.836094
##        ....     ....

Now let’s use the plot method, as follows. Note that most of the arguments of both phytools::plotTree and phytools::add.color.bar can be passed internally.

h<-max(nodeHeights(primate.tree))
plot(primate_anc,direction="upwards",ftype="i",
  fsize=0.6,title="log(skull length)",
  offset=1,legend="bottomleft",
  ylim=c(-0.1*h,1.35*h))

plot of chunk unnamed-chunk-6

That’s pretty useful, I think!

ansi_phylo ANSI text phylogeny simulator in R

While I was taking a break from fiddling around with the endeavor of implementing a discrete-character-dependent multi-$\theta$ OU model (if such a thing is even possible) using the discretized diffusion approximation, I stumbled on an old phytools called ansi_phylo for ANSI text style phylogenetic tree plotting in R.

It creates a pretty fun visual, so I thought it might be worth a re-post. Here’s a quick demo:

## load packages
library(phytools)
library(phangorn)
## load data
data(Laurasiatherian)
Laurasiatherian
## 47 sequences with 3179 character and 1605 different site patterns.
## The states are a c g t
## estimate tree using ML in phangorn
mammal_mle<-pml_bb(Laurasiatherian,model="K80")
## optimize edge weights:  -54315.47 --> -54130.45 
## optimize rate matrix:  -54130.45 --> -51361.62 
## ...
## optimize topology:  -52200.08 --> -51341.22  NNI moves:  13 
## Ratchet iteration  100 , best pscore so far: -51341.22 
## optimize rate matrix:  -51341.22 --> -51341.22 
## optimize edge weights:  -51341.22 --> -51341.22 
## optimize topology:  -51341.22 --> -51341.22  NNI moves:  0
mammal_mle
## model: K80 
## loglikelihood: -51341.22 
## unconstrained loglikelihood: -17300.92 
## 
## Rate matrix:
##          a        c        g        t
## a 0.000000 1.000000 5.025414 1.000000
## c 1.000000 0.000000 1.000000 5.025414
## g 5.025414 1.000000 0.000000 1.000000
## t 1.000000 5.025414 1.000000 0.000000
## 
## Base frequencies:  
##    a    c    g    t 
## 0.25 0.25 0.25 0.25
## get ML tree and root using Platypus as outgroup
mammal_mle.phy<-root(mammal_mle$tree,outgroup="Platypus",
  resolve.root=TRUE)
ansi_phylo(mammal_mle.phy,horizontal="=")

plot of chunk unnamed-chunk-11

That’s it!

Here, let’s try another one.

data(yeast)
yeast
## 8 sequences with 127026 character and 8899 different site patterns.
## The states are a c g t
yeast_mle<-pml_bb(yeast,model="K80")
## optimize edge weights:  -739078.1 --> -734615.7 
## optimize rate matrix:  -734615.7 --> -715533.2 
## ...
## optimize topology:  -718794.8 --> -715391.4  NNI moves:  1 
## Ratchet iteration  100 , best pscore so far: -715391.4 
## optimize rate matrix:  -715391.4 --> -715387.4 
## optimize edge weights:  -715387.4 --> -715387.3 
## optimize topology:  -715387.3 --> -715387.3  NNI moves:  0 
## optimize rate matrix:  -715387.3 --> -715387.3 
## optimize edge weights:  -715387.3 --> -715387.3
yeast_mle
## model: K80 
## loglikelihood: -715387.3 
## unconstrained loglikelihood: -659104.8 
## 
## Rate matrix:
##          a        c        g        t
## a 0.000000 1.000000 3.291576 1.000000
## c 1.000000 0.000000 1.000000 3.291576
## g 3.291576 1.000000 0.000000 1.000000
## t 1.000000 3.291576 1.000000 0.000000
## 
## Base frequencies:  
##    a    c    g    t 
## 0.25 0.25 0.25 0.25
rooted.yeast_tree<-midpoint(yeast_mle$tree)
ansi_phylo(rooted.yeast_tree,vertical="*")

plot of chunk unnamed-chunk-15

Monday, June 1, 2026

Computing the likelihood of a simple OU model using the discrete approximation

With various co-authors I recently published a bioRxiv pre-print entitled “Unlocking a flexible set of phylogenetic models for discrete and continuous trait evolution using discretized stochastic diffusion”.

In this article, we describe how a powerful idea that was first introduced to phylogenetic comparative methods by Boucher & Démery (2016) – that is, exploiting the equivalence of continuous-space diffusion & discrete-space CTMC on a lattice in the limit as the number of steps in the chain goes to \(\infty\) to compute likelihoods for otherwise intractable phylogenetic models – has a wide variety of applications outside those envisioned in prior studies (Boucher & Démery 2016; Boucher et al. 2018).

All of these applications exploit the equivalence (in the limit as the interval or bin width \(\delta \rightarrow 0\)) of a nearest-neighbor CTMC and continuous stochastic diffusion where \(q_{c} = \sigma^2/(2\delta^2)\) gives the instantaneous transition rate between adjacent bins, in which \(\sigma^2\) is the stochastic diffusion (Brownian) rate and \(\delta\) is the bin width. The cool thing about this is that we compute an approximate probability density of continuous data by first discretizing our data, computing the probability under a nearest-neighbor CTMC with \(q_{c} = \sigma^2/(2\delta^2)\) via standard pruning, and then back-transforming the probability to a density by dividing by \(\delta^N\) for \(N\) tips. In our pre-print, we show how this technique can be used to fit the multi-state threshold model, a new “semi-threshold” model, a discrete-character-dependent continuous trait model, and others. Fundamentally, all rely on the principle that if a continuous vector \(\mathbf{x}\) is discretized as \(\mathbf{x'}\), given bin width \(\delta\), then the probability of \(\mathbf{x'}\) evolving as a nearest neighbor CTMC with \(q_{c} = \sigma^2/(2\delta^2)\) between neighbor, for diffusion rate of \(\mathbf{x}\) \(\sigma^2\) is \(P(\mathbf{x'}) = D(\mathbf{x}) \times \delta^N\), in which \(D(\mathbf{x})\) is the density of \(\mathbf{x}\).

To exemplify exactly this sort of calculation on a tree, let’s start by loading phytools.

library(phytools)

Then we can proceed to simulate a tree and data vector \(\mathbf{x}\) to work with as follows.

N<-12
tree<-pbtree(n=N,tip.label=LETTERS[N:1],scale=1)
x<-fastBM(tree,a=0,sig2=0.5,alpha=1,theta=0)
plotTree(tree,ylim=c(0,1.2*max(nodeHeights(tree))),
  direction="upwards",ftype="off")
tiplabels(round(x,3),srt=90,frame="n",adj=-0.25)

plot of chunk unnamed-chunk-325

To compute the exact probability (density) of these data under (say) a hypothesis of \(x_0 = 0\) and \(\sigma^2 = 0.5\) and evolution via simple Brownian motion, we start by obtaining their covariance structure as follows.

x0<-0
sigsq<-0.5
V<-sigsq*vcv(tree)
print(V,digits=4)
##        L      K      J      I      H      G      F      E      D      C      B      A
## L 0.5000 0.4168 0.4168 0.1581 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## K 0.4168 0.5000 0.4775 0.1581 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## J 0.4168 0.4775 0.5000 0.1581 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## I 0.1581 0.1581 0.1581 0.5000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## H 0.0000 0.0000 0.0000 0.0000 0.5000 0.4931 0.2216 0.2216 0.2065 0.2065 0.2065 0.2065
## G 0.0000 0.0000 0.0000 0.0000 0.4931 0.5000 0.2216 0.2216 0.2065 0.2065 0.2065 0.2065
## F 0.0000 0.0000 0.0000 0.0000 0.2216 0.2216 0.5000 0.3534 0.2065 0.2065 0.2065 0.2065
## E 0.0000 0.0000 0.0000 0.0000 0.2216 0.2216 0.3534 0.5000 0.2065 0.2065 0.2065 0.2065
## D 0.0000 0.0000 0.0000 0.0000 0.2065 0.2065 0.2065 0.2065 0.5000 0.4876 0.3087 0.3087
## C 0.0000 0.0000 0.0000 0.0000 0.2065 0.2065 0.2065 0.2065 0.4876 0.5000 0.3087 0.3087
## B 0.0000 0.0000 0.0000 0.0000 0.2065 0.2065 0.2065 0.2065 0.3087 0.3087 0.5000 0.4619
## A 0.0000 0.0000 0.0000 0.0000 0.2065 0.2065 0.2065 0.2065 0.3087 0.3087 0.4619 0.5000

Then we can evaluate the density on a log-scale using mvtnorm::dmvnorm as so.

mvtnorm::dmvnorm(x,mean=rep(x0,Ntip(tree)),
  sigma=V,log=TRUE)
## [1] -2.797184

To do the equivalent calculation assuming a nearest-neighbor CTMC, we must first select a number of links in the chain. The higher, the more accurate our approximation. Here, I’ll choose 200.

## number of levels for discretization
n<-200

Next, I set upper & lower bounds well outside of our trait range.

## upper & lower bounds
xlims<-phytools:::expand.range(x)

Then, I find all the bin midpoints given my n and xlims.

## bin midpoints
xi<-seq(xlims[1],xlims[2],length.out=n)

And we can proceed to calculate \(\delta\), which we’ll need later.

## delta
delta<-xi[2]-xi[1]

We needn’t, but I intend to use a phytools internal helper function (phytools:::to_binned) to transform my continuous trait into a binned, discretized trait. For that, we need to calculate the upper & lower bounds of each bin, rather than just their midpoints.

## actual bins
bins<-cbind(xi-delta/2,xi+delta/2)
## bin-discretized data
X<-phytools:::to_binned(x,bins)

Finally, our likelihood function for the discretized diffusion approximation. (Note that this is not super clean. I’m scoping delta, n, and xi from my Global Environment. Still, this works in R.)

## likelihood function
lik.bm<-function(sigma2,x0,X,phy){
  q.c<-sigma2/(2*delta^2)
  Q<-matrix(0,n,n)
  for(i in 2:(n-1)) Q[i,i+1]<-Q[i,i-1]<-q.c
  diag(Q)<--rowSums(Q)
  X0<-rep(0,length(xi))
  X0[which.min(abs(x0-xi))]<-1
  fit<-fitMk(phy,X,fixedQ=Q,pi=X0)
  obj<-logLik(fit)-Ntip(phy)*log(delta)
  class(obj)<-"logLik"
  attr(obj,"df")<-3
  obj
}

If we evaluate it, we should find that it gives us a highly similar value of the density to what we calculated early using dmvnorm (-2.797184, in that case – see above).

lik.bm(sigma2=0.5,x0=0,X=X,phy=tree)
## 'log Lik.' -2.796157 (df=3)

Yup. I’d say that’s close. Fiddle with n and you should find it gets even closer!

Ornstein-Uhlenbeck is a stochastic process model where \(dX_t = \sigma dW_t+\alpha(\theta - X_t)\). The first part of this differential expression (\(\sigma dW_t\)) is just standard Brownian motion, whereas the second part (\(\alpha(\theta - X_t)\)) is deterministic.

To translate this to our nearest-neighbor CTMC, therefore, we will need a different upward & downward rate for our process, depending on which side of \(\theta\) we find ourselves. Specifically, if we’re below \(\theta\), then \(q_+ = \sigma^2/(2\delta^2) + \alpha(\theta-x)/\delta\) and \(q_- = \sigma^2/(2\delta^2)\), whereas if we’re above \(\theta\) then \(q_+ = \sigma^2/(2\delta^2)\) and \(q_- = \sigma^2/(2\delta^2)-\alpha(\theta-x)/\delta\), where \(q_+\) and \(q_-\) are the upwards & downwards rates in our nearest-neighbor CTMC, respectively. If this isn’t intuitive, remember that OU is a rubberband model, and so if we’re below \(\theta\) then our “energy” upwards is just the sum of the stochastic component and the rubber band, while the energy downwards is just the stochastic part of our model; whereas, whenever we’re above \(\theta\) precisely the opposite is true.

(Don’t worry. I’m not the one that figured out these exact maths! It’s in the physical sciences literature already.)

OK, so let’s try to set it up, much as we did before.

lik.ou<-function(theta,alpha,sigma2,X,phy){
  mu<-alpha*(theta-xi)
  q.up<-sigma2/(2*delta^2)+pmax(mu,0)/delta
  q.down<-sigma2/(2*delta^2)+pmax(-mu,0)/delta
  Q<-matrix(0,n,n)
  for(i in 2:(n-1)){
    Q[i,i+1]<-q.up[i]
    Q[i,i-1]<-q.down[i]
  }
  diag(Q)<--rowSums(Q)
  X0<-rep(0,length(xi))
  X0[which.min(abs(theta-xi))]<-1
  fit<-fitMk(phy,X,fixedQ=Q,pi=X0)
  obj<-logLik(fit)-Ntip(phy)*log(delta)
  class(obj)<-"logLik"
  attr(obj,"df")<-3
  obj
}

Now, we happen to know that these data were actually simulated under an OU model with \(\sigma^2 = 0.5\), \(\theta = 0\), and \(\alpha = 1.0\). Let’s plug these values into our CTMC likelihood function and see what we get.

lik.ou(theta=0,alpha=1,sigma2=0.5,X,tree)
## 'log Lik.' -1.48367 (df=3)

OK, so how close is this to the true density? Well, much like BM, OU is a Gaussian process and the tip values will thus be MVN. We just need a way to get the correct covariance matrix for a particular \(\sigma^2\) and \(\alpha\), which is straightforward using phytools::vcvPhylo as follows.

x0<-0
sigsq<-0.5
V<-sigsq*vcvPhylo(tree,model="OU",alpha=1,
  anc.nodes=FALSE)
print(V,digits=4)
##         L       K       J       I       H       G       F       E       D       C       B       A
## L 0.21617 0.14539 0.14539 0.02985 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
## K 0.14539 0.21617 0.19466 0.02985 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
## J 0.14539 0.19466 0.21617 0.02985 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
## I 0.02985 0.02985 0.02985 0.21617 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
## H 0.00000 0.00000 0.00000 0.00000 0.21617 0.20931 0.04827 0.04827 0.04345 0.04345 0.04345 0.04345
## G 0.00000 0.00000 0.00000 0.00000 0.20931 0.21617 0.04827 0.04827 0.04345 0.04345 0.04345 0.04345
## F 0.00000 0.00000 0.00000 0.00000 0.04827 0.04827 0.21617 0.10527 0.04345 0.04345 0.04345 0.04345
## E 0.00000 0.00000 0.00000 0.00000 0.04827 0.04827 0.10527 0.21617 0.04345 0.04345 0.04345 0.04345
## D 0.00000 0.00000 0.00000 0.00000 0.04345 0.04345 0.04345 0.04345 0.21617 0.20408 0.08246 0.08246
## C 0.00000 0.00000 0.00000 0.00000 0.04345 0.04345 0.04345 0.04345 0.20408 0.21617 0.08246 0.08246
## B 0.00000 0.00000 0.00000 0.00000 0.04345 0.04345 0.04345 0.04345 0.08246 0.08246 0.21617 0.18080
## A 0.00000 0.00000 0.00000 0.00000 0.04345 0.04345 0.04345 0.04345 0.08246 0.08246 0.18080 0.21617

Ready?

mvtnorm::dmvnorm(x,mean=rep(x0,Ntip(tree)),
  sigma=V,log=TRUE)
## [1] -1.489203

OK, that looks pretty darn similar!

Now, as a last step, & just for fun, let’s contemplate what it might look like to optimize our OU model using the discretized approximation, rather than just feeding it our generating parameter values.

First, let’s optimize in our original space to maximize the probability of obtaining the data that we have, in fact, in our hands (that is, maximize the likelihood).

## do this on a log scale
likc<-function(par,x,phy){
  -mvtnorm::dmvnorm(x,mean=rep(par[1],Ntip(phy)),
    sigma=exp(par[3])*vcvPhylo(phy,model="OU",
      alpha=exp(par[2]),anc.nodes=FALSE),
    log=TRUE)
}
## run through optim
fit<-optim(
  par=c(0,log(1),log(0.5)),
  fn=likc,
  x=x,phy=tree)
## likelihood
-fit$value
## [1] -1.019458
## parameter estimates
c(fit$par[1],exp(fit$par[2:3]))
## [1] -0.07922464  1.54252848  0.41766567

These should be similar to what we might’ve obtained using geiger::fitContinuous, of course. (Thankfully, they are.)

ou_fit<-geiger::fitContinuous(tree,x,model="OU")
ou_fit
## GEIGER-fitted comparative model of continuous data
##  fitted 'OU' model parameters:
## 	alpha = 1.542565
## 	sigsq = 0.417675
## 	z0 = -0.079225
## 
##  model summary:
## 	log-likelihood = -1.019458
## 	AIC = 8.038916
## 	AICc = 11.038916
## 	free parameters = 3
## 
## Convergence diagnostics:
## 	optimization iterations = 100
## 	failed iterations = 0
## 	number of iterations with same best fit = 60
## 	frequency of best fit = 0.600
## 
##  object summary:
## 	'lik' -- likelihood function
## 	'bnd' -- bounds for likelihood search
## 	'res' -- optimization iteration summary
## 	'opt' -- maximum likelihood parameter estimates

Now, let’s try to do the same thing using the discrete approximation.

likd<-function(par,X,phy){
  -as.numeric(lik.ou(par[1],exp(par[2]),exp(par[3]),
    X,phy))
}
fit<-optim(
  par=c(0,log(1),log(0.5)),
  fn=likd,
  X=X,
  phy=tree,
  method="Nelder-Mead")
## likelihood
-fit$value
## [1] -0.9733699
## parameter estimates
c(fit$par[1],exp(fit$par[2:3]))
## [1] -0.06975488  1.51608457  0.40323807

Yeah. Keeping in mind that this is an approximation that only becomes exact in the limit as \(\delta\) goes to zero, that totally worked.