Phylogenetic Tools for Comparative Biology: 2026

Sunday, August 2, 2026

A small experiment, but a strong case for joint estimation of discrete character dependent continuous trait evolution under a multi-regime OU model

A couple of days ago on this blog I posted an entry on comparing a discrete-character-dependent multi-regime joint OU model to OUwie when the regime history is known.

For those not following closely at home, the joint discrete & continuous trait model that I’m talking about uses a finite-space / discretized diffusion approximation as documented in prior posts to this blog (e.g., 1, 2, 3, 4, 5), following Boucher & Démery (2016; also see our in review bioRxiv pre-print).

A more typical workflow with OUwie (and other similar regime-based methods), is to generate a set of stochastic character maps of the discrete character regime following Huelsenbeck et al. (2003), fit a multi-regime OU model to each tree in the set, and then average the results across maps. Back in 2013, I pointed out that (for rate-heterogeneous models) this would tend to result in an underestimate of the difference in rates, \(\sigma^2\), between regimes. The same seems very likely to be true for multi-regime OU models, but (to my knowledge) this has not been studied explicitly. We might hope or expect that this bias would go away if we can jointly model the discrete & continuous characters.

The magnitude of this bias is also likely to vary as a function of the transition rate for the discrete trait. In other words, the higher the transition rates in Q of our discrete trait, the more uncertain our discrete character history and the more it will vary among stochastic character maps (and the more each map will differ from the unknown true history of our trait).

On a Sunday morning, I thought it would be interesting to engineer a simple comparison between this traditional OUwie workflow and fitmultiOU both for circumstances in which the transition rate of the discrete character is low & high.

To begin with, let’s load the packages we intend to use.

## load packages
library(phytools)

library(OUwie)

Now, I’m going to simulate a tree & a discrete character history on that tree with a low rate (keeping in mind, of course, that there is no special numeric value of \(q\) that is “low” or “high” – it all depends on the total depth of our tree).

phy<-pbtree(n=250,scale=10)
phy

## 
## Phylogenetic tree with 250 tips and 249 internal nodes.
## 
## Tip labels:
##   t16, t21, t123, t124, t38, t49, ...
## 
## Rooted; includes branch length(s).

q<-0.05
k<-2
low.Q<-matrix(q,k,k,dimnames=list(letters[1:k],letters[1:k]))
diag(low.Q)<-0
diag(low.Q)<--rowSums(low.Q)
low.Q

##       a     b
## a -0.05  0.05
## b  0.05 -0.05

low.sim_tree<-sim.history(phy,low.Q,
  anc=y0<-sample(letters[1:k],1))

## Done simulation(s).

low.sim_tree

## 
## Phylogenetic tree with 250 tips and 249 internal nodes.
## 
## Tip labels:
## 	t16, t21, t123, t124, t38, t49, ...
## 
## The tree includes a mapped, 2-state discrete character
## with states:
## 	a, b
## 
## Rooted; includes branch lengths.

## plot this tree (just for fun)
cols<-setNames(hcl.colors(n=k),letters[1:k])
plot(low.sim_tree,cols,ftype="off",lwd=1,
  direction="upwards",ylim=c(-1,10))
par(lend=1)
legend("bottomleft",letters[1:k],lwd=4,
  col=hcl.colors(n=k),cex=0.7,bty="n")

plot of chunk unnamed-chunk-166

Great. Clearly we’ve simulated a very low rate of discrete character evolution on this tree!

Now let’s simulate a multi-regime OU process conditioning on this true discrete character history. To do this we’ll use multiOU in phytools. Even though we’re using the same value of \(\alpha\) and \(\sigma^2\) for both of our discrete character regimes we need to set values of these two parameters for each of the two character levels.

## set alpha
alpha<-setNames(rep(0.6,k),letters[1:k])
alpha

##   a   b 
## 0.6 0.6

## set sigma-squared
sig2<-setNames(rep(0.3,k),letters[1:k])
sig2

##   a   b 
## 0.3 0.3

Now let’s set \(\theta_a\) and \(\theta_b\). I’m going to use values of \(0\) and \(5\) for the two different discrete character states.

# set theta
theta<-setNames(c(0,5),letters[1:k])
theta

## a b 
## 0 5

Now, I’ll simulate continuous trait data under the multi-regime model using our true discrete character history with phytools::multiOU. I’m going to assume the starting value is at \(\theta\) for the root regime. (This is where the a0=theta[y0] comes from!)

## simulate continuous trait
low.x<-multiOU(low.sim_tree,alpha,sig2,theta,a0=theta[y0])
head(low.x)

##        t16        t21       t123       t124        t38        t49 
##  0.6548297  0.7918578 -0.1488759  0.4502061  0.3959087  0.2302126

Terrific. So far so good.

Now we’re about to imagine that our discrete character history is unobserved, so let’s pull of just the tip values of our discrete trait from low.sim_tree as follows.

low.y<-as.factor(getStates(low.sim_tree,type="tips"))
head(low.y)

##  t16  t21 t123 t124  t38  t49 
##    a    a    a    a    a    a 
## Levels: a b

Next, let’s generate a set of stochastic character histories for our discrete state using phytools::simmap.

low.mk_fit<-fitMk(phy,low.y,model="ER",
  pi="fitzjohn")
low.mk_fit

## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##           a         b
## a -0.036172  0.036172
## b  0.036172 -0.036172
## 
## Fitted (or set) value of pi:
##       a       b 
## 0.32705 0.67295 
## due to treating the root prior as (a) nuisance.
## 
## Log-likelihood: -76.278772 
## 
## Optimization method used was "nlminb"
## 
## R thinks it has found the ML solution.

low.smps<-simmap(low.mk_fit)
low.smps

## 100 phylogenetic trees with mapped discrete characters

Let’s plot a few of these.

par(mfrow=c(5,5))
nulo<-sapply(low.smps[1:25],plot,lwd=1,col=cols,ftype="off",
  direction="upwards")

plot of chunk unnamed-chunk-174

Something that readers will probably notice immediately is that all of these discrete character histories fairly closely resemble each other & the true history of our trait.

Next, let’s fit OUwie models to each of them. That’ll be fun. To speed it along, I’m going to parallelize across stochastic map trees using the foreach package.

## make our OUwie data frame
low.ouwie_data<-data.frame(
  Genus_species=names(low.x),
  Reg=low.y,
  X=low.x)
head(low.ouwie_data)

##      Genus_species Reg          X
## t16            t16   a  0.6548297
## t21            t21   a  0.7918578
## t123          t123   a -0.1488759
## t124          t124   a  0.4502061
## t38            t38   a  0.3959087
## t49            t49   a  0.2302126

## load foreach and doParallel
library(foreach)

library(doParallel)

## set up our cluster
ncores<-detectCores()-2
ncores

## [1] 14

mc<-makeCluster(ncores,type="PSOCK")
mc

## socket cluster with 14 nodes on host 'localhost'

registerDoParallel(cl=mc)

## optimize across all 100 stochastic maps
low.ouwie_fits<-foreach(i=1:length(low.smps))%dopar%{
  OUwie::OUwie(low.smps[[i]],low.ouwie_data,model="OUM",
    simmap.tree=TRUE,root.station=FALSE)
}

stopCluster(mc)

The object we’ve created is a long list of fitted “OUwie” models. We can pull out just the specific results that we’re interested in to make it easier to see the general pattern.

foo<-function(x) setNames(
  c(
    x$theta[,1],
    x$solution[,1],
    x$loglik
  ),
  c(
    "the[a]","the[b]","alpha","sigsq","log(L)"
  )
)
low.ouwie_results<-t(sapply(low.ouwie_fits,foo))
rownames(low.ouwie_results)<-1:nrow(low.ouwie_results)

options(scipen=5) ## just for printing
round(low.ouwie_results[1:25,],digits=2)

##    the[a] the[b] alpha sigsq  log(L)
## 1    0.08   4.84  0.66  0.42 -182.89
## 2    0.11   4.95  0.49  0.36 -194.30
## 3    0.19   5.07  0.41  0.41 -224.99
## 4    0.20   4.95  0.63  0.48 -206.92
## 5    0.00   4.75  0.67  0.52 -209.01
## 6    0.11   4.95  0.44  0.40 -216.47
## 7    0.08   4.87  0.47  0.41 -214.46
## 8    0.05   4.86  0.48  0.36 -192.84
## 9    0.01   4.91  0.40  0.41 -227.67
## 10   0.03   4.82  0.62  0.41 -186.94
## 11   0.07   4.89  0.50  0.36 -193.14
## 12  -0.10   4.86  0.33  0.38 -232.95
## 13   0.03   4.86  0.59  0.37 -178.45
## 14  -0.03   4.78  0.43  0.38 -213.53
## 15   0.03   4.88  0.48  0.31 -175.21
## 16   0.10   4.97  0.50  0.33 -180.42
## 17   0.17   4.95  0.49  0.48 -228.08
## 18   0.12   4.92  0.60  0.48 -210.78
## 19   0.04   4.82  0.48  0.37 -200.46
## 20  -0.07   4.78  0.42  0.34 -199.99
## 21   0.21   5.29  0.34  0.41 -239.15
## 22   0.05   4.87  0.57  0.40 -193.95
## 23   0.14   5.06  0.47  0.39 -207.78
## 24   0.22   5.21  0.37  0.39 -227.25
## 25   0.10   4.90  0.61  0.43 -196.33

options(scipen=0)

This is pretty cool because it shows that all of our fitted models are pretty close to the generating conditions of \(\mathbf{\theta} = [0, 5]\).

colMeans(low.ouwie_results)

##        the[a]        the[b]         alpha         sigsq        log(L) 
##    0.06588211    4.91247554    0.49325269    0.40862951 -208.88237689

We can also compare this to phytools::fitmultiOU, which I predict will get a similar model.

init<-setNames(
  c(
    mean(low.x[low.y==levels(low.y)[1]]),
    mean(low.x[low.y==levels(low.y)[2]]),
    log(2)/max(nodeHeights(phy)),
    var(low.x)/max(nodeHeights(phy)),
    fitMk(phy,low.y,model="ER")$rates),
  c("theta[a]","theta[b]",
    "alpha","sigsq","q[1]"))
init

##   theta[a]   theta[b]      alpha      sigsq       q[1] 
## 0.14104895 4.40690078 0.06931472 0.52619893 0.03687111

low.fit_mou<-fitmultiOU(phy,low.x,low.y,model="ER",
  levs=100,parallel=TRUE,ncores=ncores,root="mle",
  trace=1,maxit=2000,init=init)

## iter	the[a]	the[b]	alpha	sigsq	q[1]	log(L)
## 0	0.1410	4.4069	0.0693	0.5262	0.0369	-378.6136 
## 100	-0.3327	4.7524	0.3626	0.1910	0.0282	-258.4337 
## 200	0.0864	4.8329	0.5574	0.2065	0.0202	-247.0498 
## 300	0.1137	4.9956	0.5351	0.2351	0.0203	-245.0731 
## 400	0.0581	4.8858	0.6185	0.2743	0.0387	-242.3865 
## 500	0.0999	4.9000	0.6364	0.2739	0.0330	-241.9284 
## 600	0.1005	4.9045	0.6323	0.2720	0.0331	-241.9258 
## 666	0.1009	4.9043	0.6320	0.2719	0.0331	-241.9258 
## Done optimizing.

low.fit_mou

## Object of class "fitmultiOU" based on
##     a discretization with k = 100 levels.
## 
## Fitted multi-theta OU model parameters:
##  levels: [ a, b ]
##   theta: [ 0.1009, 4.9043 ]
##   alpha: 0.632 
##   sigsq: 0.2719 
## 
## Estimated Q matrix:
##             a           b
## a -0.03312624  0.03312624
## b  0.03312624 -0.03312624
## 
## Log-likelihood: -241.9258 
## 
## R thinks it has found the ML solution.

Even though both analyses gave as parameter estimates quite close to the generating values, joint estimation seems to be even more accurate... particularly with regard to \(\alpha\) and \(\sigma^2\). That's cool.

Now let’s consider the case of a high rate of transition for our discrete character.

q<-0.8
k<-2
high.Q<-matrix(q,k,k,dimnames=list(letters[1:k],
  letters[1:k]))
diag(high.Q)<-0
diag(high.Q)<--rowSums(high.Q)
high.Q

##      a    b
## a -0.8  0.8
## b  0.8 -0.8

Now the discrete character is expected to change a lot in the true history.

high.sim_tree<-sim.history(phy,high.Q,
  anc=y0)

## Done simulation(s).

high.sim_tree

## 
## Phylogenetic tree with 250 tips and 249 internal nodes.
## 
## Tip labels:
## 	t16, t21, t123, t124, t38, t49, ...
## 
## The tree includes a mapped, 2-state discrete character
## with states:
## 	a, b
## 
## Rooted; includes branch lengths.

## plot this tree (just for fun)
cols<-setNames(hcl.colors(n=k),letters[1:k])
plot(high.sim_tree,cols,ftype="off",lwd=1,
  direction="upwards",ylim=c(-1,10))
par(lend=1)
legend("bottomleft",letters[1:k],lwd=4,
  col=hcl.colors(n=k),cex=0.7,bty="n")

plot of chunk unnamed-chunk-191

(Let’s generate continuous trait data under the same \(\alpha\), \(\sigma^2\) and \(\mathbf{\theta}}=[0,5]\) as before, but with our new, high Q history.)

## simulate continuous trait
high.x<-multiOU(high.sim_tree,alpha,sig2,theta,
  a0=theta[y0])
head(high.x)

##       t16       t21      t123      t124       t38       t49 
##  1.142639  1.634035  1.209871  1.537554  2.871243 -1.072081

Of course, now we might expect our stochastic character histories to differ more from each other and from the true history.

high.y<-as.factor(getStates(high.sim_tree,type="tips"))
head(high.y)

##  t16  t21 t123 t124  t38  t49 
##    a    a    a    a    a    a 
## Levels: a b

high.mk_fit<-fitMk(phy,high.y,model="ER",
  pi="fitzjohn")
high.mk_fit

## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##           a         b
## a -0.458785  0.458785
## b  0.458785 -0.458785
## 
## Fitted (or set) value of pi:
##        a        b 
## 0.499556 0.500444 
## due to treating the root prior as (a) nuisance.
## 
## Log-likelihood: -159.945824 
## 
## Optimization method used was "nlminb"
## 
## R thinks it has found the ML solution.

high.smps<-simmap(high.mk_fit)
high.smps

## 100 phylogenetic trees with mapped discrete characters

Much as we did earlier, let’s plot a few of these just to see what we’ve got!

par(mfrow=c(5,5))
nulo<-sapply(high.smps[1:25],plot,lwd=1,col=cols,
  ftype="off",direction="upwards")

plot of chunk unnamed-chunk-196

It should be pretty evident, I think, that (even though they seem to have arisen under the same process) in this case the specific details of our discrete character histories vary quite widely one from the other, as well as from our generating history.

I expect that the consequence of this will be that each one of our stochastic character histories is likely to add error (and perhaps bias) to the estimating of the continuous trait multi-regime OU process. Let’s see if that’s true.

## make our OUwie data frame
high.ouwie_data<-data.frame(
  Genus_species=names(high.x),
  Reg=high.y,
  X=high.x)
head(high.ouwie_data)

##      Genus_species Reg         X
## t16            t16   a  1.142639
## t21            t21   a  1.634035
## t123          t123   a  1.209871
## t124          t124   a  1.537554
## t38            t38   a  2.871243
## t49            t49   a -1.072081

Once again, we’ll parallelize across maps using foreach::foreach.

mc<-makeCluster(ncores,type="PSOCK")
registerDoParallel(cl=mc)
## optimize across all 100 stochastic maps
high.ouwie_fits<-foreach(i=1:length(high.smps))%dopar%{
  OUwie::OUwie(high.smps[[i]],high.ouwie_data,model="OUM",
    simmap.tree=TRUE,root.station=FALSE)
}
stopCluster(mc)

Let’s summarize our results.

foo<-function(x) setNames(
  c(
    x$theta[,1],
    x$solution[,1],
    x$loglik
  ),
  c(
    "the[a]","the[b]","alpha","sigsq","log(L)"
  )
)
high.ouwie_results<-t(sapply(high.ouwie_fits,foo))
rownames(high.ouwie_results)<-1:nrow(high.ouwie_results)

(Here’s just the first 25, again.)

options(scipen=5) ## just for printing
round(high.ouwie_results[1:25,],digits=2)

##    the[a] the[b] alpha sigsq  log(L)
## 1    0.67   3.66  0.20  0.73 -350.92
## 2    0.64   3.55  0.19  0.74 -352.68
## 3    0.55   3.73  0.19  0.71 -349.64
## 4    1.11   3.40  0.19  0.75 -355.78
## 5    0.59   3.63  0.19  0.72 -350.10
## 6    1.24   3.54  0.22  0.80 -354.80
## 7    1.51   3.03  0.19  0.78 -360.86
## 8    1.02   3.40  0.21  0.78 -354.09
## 9    1.37   4.09  0.21  0.76 -350.52
## 10   0.94   3.56  0.20  0.75 -353.15
## 11   0.94   4.29  0.21  0.71 -342.30
## 12   1.14   3.25  0.22  0.81 -356.49
## 13   1.66   3.62  0.17  0.74 -359.53
## 14   1.46   3.64  0.20  0.77 -356.67
## 15   1.02   3.13  0.16  0.72 -358.30
## 16   0.31   3.84  0.22  0.72 -341.27
## 17   1.54   3.59  0.19  0.77 -358.10
## 18   1.43   4.06  0.18  0.73 -355.63
## 19   1.53   3.73  0.21  0.78 -355.44
## 20   1.35   4.16  0.18  0.72 -352.88
## 21   1.53   3.08  0.18  0.77 -360.42
## 22   0.42   3.49  0.19  0.73 -350.55
## 23   1.42   3.85  0.21  0.79 -354.72
## 24   0.23   3.70  0.17  0.68 -350.52
## 25   1.02   3.36  0.20  0.77 -355.54

options(scipen=0)

Now let’s get an average across trees.

colMeans(high.ouwie_results)

##       the[a]       the[b]        alpha        sigsq       log(L) 
##    1.0776599    3.7045410    0.2024869    0.7556554 -352.7203710

As we predicted, the estimates of \(\theta\)), in particular, are biased towards each other -- but we also see that \(\alpha\) is biased downwards and \(\sigma^2\) upwards.

Finally, let’s compare this to phytools::fitmultiOU.

init<-setNames(
  c(
    mean(high.x[high.y==levels(high.y)[1]]),
    mean(high.x[high.y==levels(high.y)[2]]),
    log(2)/max(nodeHeights(phy)),
    var(high.x)/max(nodeHeights(phy)),
    fitMk(phy,high.y,model="ER")$rates),
  c("theta[a]","theta[b]",
    "alpha","sigsq","q[1]"))
high.fit_mou<-fitmultiOU(phy,high.x,high.y,model="ER",
  levs=100,parallel=TRUE,ncores=ncores,root="mle",
  trace=1,maxit=2000,init=init)

## iter	the[a]	the[b]	alpha	sigsq	q[1]	log(L)
## 0	1.9220	3.2096	0.0693	0.1930	0.4588	-622.5731 
## 100	1.1219	4.2495	0.4753	0.6045	0.5548	-490.4077 
## 200	-0.6367	4.8417	0.4181	0.2777	0.6418	-473.6047 
## 300	-0.4362	4.8024	0.4288	0.2951	0.6848	-473.2549 
## 400	-0.5098	5.0866	0.4541	0.1942	0.7713	-472.2389 
## 500	-0.4734	5.0244	0.4641	0.2102	0.7708	-472.1732 
## 600	-0.4698	5.0304	0.4678	0.2093	0.7854	-472.1688 
## 700	-0.4622	5.0253	0.4686	0.2081	0.7829	-472.1673 
## 732	-0.4625	5.0253	0.4682	0.2082	0.7824	-472.1673 
## Done optimizing.

high.fit_mou

## Object of class "fitmultiOU" based on
##     a discretization with k = 100 levels.
## 
## Fitted multi-theta OU model parameters:
##  levels: [ a, b ]
##   theta: [ -0.4625, 5.0253 ]
##   alpha: 0.4682 
##   sigsq: 0.2082 
## 
## Estimated Q matrix:
##            a          b
## a -0.7823842  0.7823842
## b  0.7823842 -0.7823842
## 
## Log-likelihood: -472.1673 
## 
## R thinks it has found the ML solution.

This is pretty astonishing & seems to make quite a convincing case for joint estimation.

Admittedly, this is a very small experiment, of course – however, it does seem to confirm the thesis that if our discrete character changes infrequently, then the traditional two-step process of generating stochastic character maps & then fitting a fixed regime model to each map should work just fine. On the other hand, it also generally affirms Revell (2013, which perhaps ought to be a bit better cited) that as the rate of transition in our character goes up, the two-step process becomes biased and tends to underestimate the difference between the regimes. (This also may affect the measurement of \(\alpha\), which is underestimated; and \(\sigma^2\), which is overestimated.)

This bias goes away when we jointly model \(x\) and \(y\) using our new finite-space approximation, which is very cool IMO.

OK, more on this later!

Wednesday, July 29, 2026

Stochastic character mapping under the threshold model using `fitThresh` in phytools

Unfortunately, I’m not usually so responsive to email inquiries these days (sorry!), but… just yesterday a phytools user contacted me with the subject line “getting a simmap style tree from a threshold model result” and the following email text:

“I have a weird question regarding your threshold model functions. I have a continuous trait I am making discrete bins out of to test whether multivariate rates of shape evolution vary with the levels of this (now discrete) trait using mvMORPH. fitThresh makes a lot of sense for reconstructing this character, and results in ancestral values that look different/more sensible than make.simmap (based on a reconstruction of the underlying continuous character). But I need the threshold model results in some sort of format that will work with mvBM (e.g. "simmap")…. Is there a not too complicated way to convert the ‘full’ Q matrix into transitions between the original discrete character, or do you have a better recommendation for making fitThresh play nice with the trait fitting functions in phytools/mvMORPH/OUwie/etc? If there is no good answer right now no worries, just couldn’t sus it out on my own.”

Indeed, this is something that I’d thought of already – that is, that the discrete diffusion / finite-space approximation that we use in fitThresh and other new methods of phytools would enable new sorts of stochastic character mapping, including of the threshold model and, indeed, of actual continuous traits.

Before I go on to show how to do this, I will note that this is an approximation of the corresponding stochastic character map for the threshold process, only inasmuch as technically the threshold is crossed \(\infty\) times by the underlying fractal BM process when it moves from one side of the threshold to the other. (If you didn’t already know this, just trust me. It’s true.)

So, how do we do this?

Well, so far I have not yet automated this in software, but it’s pretty straightforward if we go through it step by step.

I’m going to start with the simplest example, which is the threshold model with two states, let’s call them "a" and "b".

## load phytools
library(phytools)

First, let’s simulate some data under this process.

## simulate tree
phy<-pbtree(n=80,scale=1)
phy

## 
## Phylogenetic tree with 80 tips and 79 internal nodes.
## 
## Tip labels:
##   t28, t31, t32, t23, t24, t19, ...
## 
## Rooted; includes branch length(s).

## simulate liabilities
liability<-fastBM(phy,a=0.5)
head(liability,10)

##        t28        t31        t32        t23        t24        t19        t25        t26 
##  0.9810162  1.1667724  0.9394853  0.5019343  0.2609373 -0.1475927 -0.4670823  0.2442964 
##        t27        t33 
## -0.7697432 -1.0546414

## "threshold" liabilities
thresh<-function(x) if(x<0) "a" else "b"
x<-sapply(liability,thresh)
head(x,20)

## t28 t31 t32 t23 t24 t19 t25 t26 t27 t33 t35 t37 t38 t79 t80  t9 t29 t30 t46 t47 
## "b" "b" "b" "b" "b" "a" "a" "b" "a" "a" "a" "a" "a" "a" "a" "b" "a" "a" "a" "a"

Now let’s proceed to fit the threshold model to these data using fitThresh as follows.

thresh_fit<-fitThresh(phy,x)
thresh_fit

## Object of class "fitThresh".
## 
##     Set value of sigsq (of the liability) = 1.0
## 
## 	  Set or estimated threshold(s) =  [ 0 ]*
## 
##     Log-likelihood: -33.901524 
## 
## (*lowermost threshold is fixed)

(Totally just for fun, let’s fit & compare a symmetric Mk model to see if they’re distinguishable based on these data.)

mk_model<-fitMk(phy,x,model="ER")
anova(mk_model,thresh_fit)

##               log(L) d.f.      AIC    weight
## mk_model   -36.04421    1 74.08842 0.1050167
## thresh_fit -33.90152    1 69.80305 0.8949833

Now, to undertake stochastic mapping using our threshold model, we need to pull out the implicit Mk model that’s hidden within our "fitThresh" object.

mkm<-thresh_fit$mk_fit

(I’m not going to print it because the Q matrix is \(200 \times 200\), but readers following along should feel free to go for it.)

Next, because this hidden object is not quite the same as a standard "fitMk" object, let’s add the element root.prior as follows.

mkm$root.prior<-"fitzjohn"

Now, believe it or not, we’re totally ready for stochastic mapping.

When we run the following code we’re probably going to see the message Warning in rstate(p/sum(p)) : Some probabilities (slightly?) < 0. Setting p < 0 to zero. This is just because our matrix is so big & some of the elements are close to zero, so we can safely ignore it.

I’m only going to do one stochastic character map here. Typically we should do many of these, of course. This would be accomplished by modifying nsim and then iterating all subsequent steps over each stochastic map tree.

smp<-simmap(mkm,nsim=1,pi="mle")

## Warning in rstate(p/sum(p)): Some probabilities (slightly?) < 0. Setting p < 0 to zero.
## Warning in rstate(p/sum(p)): Some probabilities (slightly?) < 0. Setting p < 0 to zero.
## Warning in rstate(p/sum(p)): Some probabilities (slightly?) < 0. Setting p < 0 to zero.
## ...
## Warning in rstate(p/sum(p)): Some probabilities (slightly?) < 0. Setting p < 0 to zero.

Once again, this is a big object because it contains all 200 of the finely discretized levels of our diffusion approximation – effectively, a stochastic map of liabilities.

To turn this into a stochastic character map of our original discrete trait we need to simply merge liabilities on each side of the single threshold as follows.

ss<-colnames(mkm$data)
merged.smp<-mergeMappedStates(smp,ss[which(as.numeric(ss)<0)],"a")
merged.smp<-mergeMappedStates(merged.smp,ss[which(as.numeric(ss)>0)],"b")
merged.smp

## 
## Phylogenetic tree with 80 tips and 79 internal nodes.
## 
## Tip labels:
## 	t28, t31, t32, t23, t24, t19, ...
## 
## The tree includes a mapped, 2-state discrete character
## with states:
## 	a, b
## 
## Rooted; includes branch lengths.

Great. Let’s plot it.

plot(merged.smp,direction="upwards",ftype="off",
  lwd=3,colors=setNames(hcl.colors(n=2),c("a","b")))
legend("bottomleft",c("a","b"),col=hcl.colors(n=2),
  lwd=2,bty="n")

plot of chunk unnamed-chunk-12

We can see that this is clearly capturing a form of the threshold model history of our threshold character by seeing just how different it looks compared to running the same analysis with a standard Mk model!

mk.smp<-simmap(mk_model,nsim=1)
plot(mk.smp,direction="upwards",ftype="off",
  lwd=3,colors=setNames(hcl.colors(n=2),c("a","b")))
legend("bottomleft",c("a","b"),col=hcl.colors(n=2),
  lwd=2,bty="n")

plot of chunk unnamed-chunk-13

We see, for instance, that threshold crossing usually involves many switches back & forth under the threshold model, and none in the Mk model, which makes perfect sense!

Things get only a little more complicated when I want to fit the multi-state model.

Let’s see if I can run through that.

First, I’ll reuse my original tree & liabilities to simulate some tip data as follows:

y<-threshState(liability,setNames(c(0,1,2),letters[1:3]))
head(y)

## t28 t31 t32 t23 t24 t19 
## "b" "c" "b" "b" "b" "a"

Next, fit the multi-state threshold model as follows.

ms_thresh<-fitThresh(phy,y,sequence=letters[1:3])
ms_thresh

## Object of class "fitThresh".
## 
##     Set value of sigsq (of the liability) = 1.0
## 
## 	  Set or estimated threshold(s) =  [ -0.7568, 0.341537 ]*
## 
##     Log-likelihood: -51.847661 
## 
## (*lowermost threshold is fixed)

Note that the lower estimated threshold is not a separately estimable parameter (otherwise the model would be non-identifiable). It’s normally set to zero, but fitThresh does something different that I’ll probably fix in the future. (It centers the whole liability distribution on zero instead. Since the liabilities are unitless and scaleless this doesn’t matter at all for the model fit or relative positions of the thresholds, but our results are a bit more annoying to interpret.)

Pull out our hidden Mk object & do stochastic mapping, as before.

mkn<-ms_thresh$mk_fit
mkn$root.prior<-"fitzjohn"
smp<-simmap(mkn,nsim=1)

## Warning in rstate(p/sum(p)): Some probabilities (slightly?) < 0. Setting p < 0 to zero.
## Warning in rstate(p/sum(p)): Some probabilities (slightly?) < 0. Setting p < 0 to zero.
## Warning in rstate(p/sum(p)): Some probabilities (slightly?) < 0. Setting p < 0 to zero.
## ...
## Warning in rstate(p/sum(p)): Some probabilities (slightly?) < 0. Setting p < 0 to zero.

Now, repeat the mergeMappedStates step, but this time using the estimated thresholds.

ss<-mkn$states
merged.smp.2<-mergeMappedStates(smp,
  ss[which(as.numeric(ss)<ms_thresh$threshold[1])],"a")
merged.smp.2<-mergeMappedStates(merged.smp.2,
  ss[which(as.numeric(ss)>=ms_thresh$threshold[1]&
      as.numeric(ss)<ms_thresh$threshold[2])],"b")
merged.smp.2<-mergeMappedStates(merged.smp.2,
  ss[which(as.numeric(ss)>=ms_thresh$threshold[2])],"c")
merged.smp.2

## 
## Phylogenetic tree with 80 tips and 79 internal nodes.
## 
## Tip labels:
## 	t28, t31, t32, t23, t24, t19, ...
## 
## The tree includes a mapped, 3-state discrete character
## with states:
## 	a, b, c
## 
## Rooted; includes branch lengths.

Let’s plot it!

plot(merged.smp.2,direction="upwards",ftype="off",
  lwd=3,colors=setNames(hcl.colors(n=3),c("a","b","c")))
legend("bottomleft",c("a","b","c"),col=hcl.colors(n=3),
  lwd=3,bty="n")

plot of chunk unnamed-chunk-18

Wow. That’s beautiful!

Very cool.

Comparing a discrete character dependent multi-regime OU joint model to OUwie

In some recent posts to this blog (e.g., 1, 2, 3, 4) I have described a new discrete character dependent multi-optimum model in phytools that uses the finite-space or discrete diffusion approximation of Boucher & Démery (2016; also see our in review bioRxiv pre-print).

Even though they implement different models, I thought it might be interesting to compare this new method, fitmultiOU, to a fixed-regime multi-optimum OU model fit using the popular OUwie package by Jeremy Beaulieu and Brian O’Meara.

Since our new method integrates over uncertainty in the regime history by jointly optimizing an Mk transition process for the regimes along with the multi-optimum stochastic process of our continuous trait, my thesis is that the parameter estimates we obtain should be highly similar between the two models if our discrete trait changes infrequently.

To explore this, we can start by simulating a tree, a discrete character history, and some data using phytools as follows.

## load phytools
library(phytools)

## simulate a tree
N<-100 ## number of taxa
phy<-pbtree(n=N,scale=10)
phy

## 
## Phylogenetic tree with 100 tips and 99 internal nodes.
## 
## Tip labels:
##   t2, t3, t89, t90, t35, t53, ...
## 
## Rooted; includes branch length(s).

## set the transition matrix of our
## discrete trait
q<-0.05
k<-2
Q<-matrix(q,k,k,
  dimnames=list(letters[1:k],letters[1:k]))
diag(Q)<-0
diag(Q)<--rowSums(Q)
Q

##       a     b
## a -0.05  0.05
## b  0.05 -0.05

## simulate trait history
sim_tree<-sim.history(phy,Q,anc=sample(letters[1:k],1))

## Done simulation(s).

cols<-setNames(hcl.colors(n=k),letters[1:k])
plot(sim_tree,cols,ftype="off",lwd=2,
  direction="upwards")
par(lend=1)
legend("bottomleft",letters[1:k],lwd=4,
  col=hcl.colors(n=k),cex=0.7,bty="n")

plot of chunk unnamed-chunk-48

This is great because our tree has very few changes in the character. My hypothesis is that this will make parameter estimates of the OU process very similar between a fixed regime model, as in OUwie, and our new fitmultiOU method.

Now let’s set the generating conditions of our discrete character dependentn multi-\(\theta\) Ornstein-Uhlenbeck process.

The parameters \(\alpha\) and \(\sigma^2\) will be the same across all regimes (though we still have to specify \(k = 2\) of each for our generator), while \(\theta\) will vary according to the state of our discrete trait.

## set alpha
alpha<-setNames(rep(0.3,k),letters[1:k])
alpha

##   a   b 
## 0.3 0.3

## set sigma-squared
sig2<-setNames(rep(0.1,k),letters[1:k])
sig2

##   a   b 
## 0.1 0.1

# set theta
theta<-setNames(c(-0.5,2),letters[1:k])
theta

##    a    b 
## -0.5  2.0

At this point we can simulate our continuous trait using phytools::multiOU.

## simulate continuous trait
x<-multiOU(sim_tree,alpha,sig2,theta,a0=0)
head(x)

##         t2         t3        t89        t90        t35        t53 
## -0.3416784 -0.6347880 -0.3128115 -0.5001694 -0.7747887 -0.5183583

Our discrete character is already simulated, but we need to pull of the tip values from the sim_tree "simmap" object to use them in our analysis.

## pull off discrete trait
y<-as.factor(getStates(sim_tree,"tips"))
head(y)

##  t2  t3 t89 t90 t35 t53 
##   a   a   a   a   a   a 
## Levels: a b

For completeness, let’s start by fitting our null model using fitmultiOU.

We can then confirm that this fitted model matches (to a reasonable degree – they will only converge exactly as levs goes towards \(\infty\)) what we’d obtain using geiger::fitContinuous and phytools::fitMk under the same model assumptions. I'm going to set levs = 200 for this analysis, but this actually takes a very long time to run (much more than twice as long as levs = 100).

## fit null model
fit_null<-fitmultiOU(phy,x,y,model="ER",levs=200,
  parallel=TRUE,ncores=10,root="mle",trace=1,
  null_model=TRUE)

## iter	theta	alpha	sigsq	q[1]	log(L)
## 0	1.9456	0.0721	1.1329	0.0914	-175.6004 
## 100	0.6244	0.0116	0.0897	0.0271	-98.3007 
## 200	0.6123	0.0053	0.0892	0.0283	-98.2222 
## 271	0.6338	0.0004	0.0860	0.0292	-98.1838 
## Done optimizing.

fit_null

## Object of class "fitmultiOU" based on
##     a discretization with k = 200 levels.
## 
## Fitted multi-theta OU model parameters:
##  levels: [ a, b ]
##   theta: [ 0.6338 ]
##   alpha: 4e-04 
##   sigsq: 0.086 
## 
## Estimated Q matrix:
##             a           b
## a -0.02921177  0.02921177
## b  0.02921177 -0.02921177
## 
## Log-likelihood: -98.1838 
## 
## R thinks optimization may not have converged.

## fit single regime OU model using fitContinuous
ou_fit<-geiger::fitContinuous(phy,x,model="OU")
ou_fit

Now, again, let's compare to geiger::fitContinuous and phytools::fitMk.

## GEIGER-fitted comparative model of continuous data
##  fitted 'OU' model parameters:
## 	alpha = 0.000000
## 	sigsq = 0.085106
## 	z0 = 0.625022
## 
##  model summary:
## 	log-likelihood = -68.250476
## 	AIC = 142.500953
## 	AICc = 142.750953
## 	free parameters = 3
## 
## Convergence diagnostics:
## 	optimization iterations = 100
## 	failed iterations = 0
## 	number of iterations with same best fit = 50
## 	frequency of best fit = 0.500
## 
##  object summary:
## 	'lik' -- likelihood function
## 	'bnd' -- bounds for likelihood search
## 	'res' -- optimization iteration summary
## 	'opt' -- maximum likelihood parameter estimates

## fit Mk model using fitMk
mk_fit<-fitMk(phy,y,model="ER",pi="equal")
mk_fit

## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##           a         b
## a -0.028064  0.028064
## b  0.028064 -0.028064
## 
## Fitted (or set) value of pi:
##   a   b 
## 0.5 0.5 
## due to treating the root prior as (a) flat.
## 
## Log-likelihood: -29.168076 
## 
## Optimization method used was "nlminb"
## 
## R thinks it has found the ML solution.

You can compare the model parameter estimates, but let’s also assure ourselves that the likelihood seems to be converging on the same value as follows.

## compute null log(L) from fitContinuous & fitMk results
null_logL<-logLik(ou_fit)+logLik(mk_fit)
attr(null_logL,"df")<-4 ## fix d.f.
null_logL

## [1] -97.41855
## attr(,"df")
## [1] 4

## compare to fitmultiOU
logLik(fit_null)

## [1] -98.18378
## attr(,"df")
## [1] 4

This is pretty close. Again, we would expect these two values to get even closer for higher levs, but (as currently implemented) this already takes a really long time to run!

OK. Now let’s bring OUwie into the picture. We can start by loading the package, which I recently updated from CRAN.

## load OUwie
library(OUwie)

Now for OUwie we need to put our data in a special data frame format as follows.

## compile our data for OUwie
ouwie.data<-data.frame(
  Genus_species=names(x),
  Reg=y,
  X=x)
head(ouwie.data)

##     Genus_species Reg          X
## t2             t2   a -0.3416784
## t3             t3   a -0.6347880
## t89           t89   a -0.3128115
## t90           t90   a -0.5001694
## t35           t35   a -0.7747887
## t53           t53   a -0.5183583

Why don’t we start by simply re-fitting our null OU model in OUwie?

This should give us a result that quite closely matches what we obtained using geiger::fitContinuous.

I’ll still give it our known discrete character data & history, but I set model = "OU1" to specify that I want a single \(\theta\) model only.

## fit OUwie null model
fitOU.smp<-OUwie(sim_tree,ouwie.data,model="OU1",
  simmap.tree=TRUE,root.station=FALSE)

## Warning: An algorithm was not specified. Defaulting to computing the determinant and inversion of the vcv.

## Initializing... 
## Finished. Begin thorough search... 
## Finished. Summarizing results.

fitOU.smp

## 
## Fit
##        lnL     AIC    AICc      BIC model ntax
##  -68.25048 142.501 142.751 150.3165   OU1  100
## 
## 
## Rates
##        alpha     sigma.sq 
## 1.524044e-08 8.510672e-02 
## 
## Optima
##                  1
## estimate 0.6250224
## se       0.3348563
## 
## 
## Half life (another way of reporting alpha)
##    alpha 
## 45480781 
## 
## Arrived at a reliable solution

Hopefully, we see that this fitted model pretty closely matches what we got using fitContinuous earlier.

Next, I’m going to go ahead & fit our discrete character dependent multi- \(\theta\) OU model using phytools::fitmultiOU.

This is the model that I’ve been blogging about recently - but, to remind the reader, this is a joint discrete & continuous trait model, not the fixed regime model of OUwie – but I’m hypothesizing that our continuous trait model parameter estimates should pretty closely match what we’d get from OUwie using the true history (or, for that matter, a stochastic character history), just because our discrete character changes so infrequently on the tree.

To start with, I'm going to try to get reasonable starting values for my fitmultiOU parameters, as follows.

## identify sensible starting parameter values
init<-setNames(
  c(
    mean(x[y==levels(y)[1]]),
    mean(x[y==levels(y)[2]]),
    log(2)/max(nodeHeights(phy)),
    var(x)/max(nodeHeights(phy)),
    fitMk(phy,y,model="ER")$rates),
  c("theta[a]","theta[b]",
    "alpha","sigsq","q[1]"))
init

##    theta[a]    theta[b]       alpha       sigsq        q[1] 
## -0.27841916  1.61842864  0.06931472  0.12828760  0.02806402

Then I can go ahead and fit the joint model.

## fit discrete trait dependent model
fit_mou<-fitmultiOU(phy,x,y,model="ER",levs=200,
  parallel=TRUE,ncores=10,root="mle",trace=1,
  maxit=2000,init=init)

## iter	the[a]	the[b]	alpha	sigsq	q[1]	log(L)
## 0	-0.2784	1.6184	0.0693	0.1283	0.0281	-92.1417 
## 100	-0.4558	1.8575	0.3260	0.1157	0.0268	-72.4242 
## 200	-0.4520	1.8537	0.3254	0.1143	0.0267	-72.4172 
## 300	-0.4531	1.8535	0.3252	0.1145	0.0269	-72.4170 
## 400	-0.4520	1.8523	0.3246	0.1142	0.0271	-72.4163 
## 404	-0.4519	1.8523	0.3246	0.1141	0.0271	-72.4163 
## Done optimizing.

fit_mou

## Object of class "fitmultiOU" based on
##     a discretization with k = 200 levels.
## 
## Fitted multi-theta OU model parameters:
##  levels: [ a, b ]
##   theta: [ -0.4519, 1.8523 ]
##   alpha: 0.3246 
##   sigsq: 0.1141 
## 
## Estimated Q matrix:
##             a           b
## a -0.02713072  0.02713072
## b  0.02713072 -0.02713072
## 
## Log-likelihood: -72.4163 
## 
## R thinks it has found the ML solution.

Finally, we can fit our fixed regime model using the OUwie package. Here, don’t forget that though I will be using the true discrete character history, this is almost never known in practice! (Indeed, if we genuinely knew the discrete character history, I would always recommend fitting a fixed regime model, not a joint model.)

## fit multi-regime OU model using OUwie
fitOUM.smp<-OUwie(sim_tree,ouwie.data,model="OUM",
  simmap.tree=TRUE,root.station=FALSE)

## Warning: An algorithm was not specified. Defaulting to computing the determinant and inversion of the vcv.

## Initializing... 
## Finished. Begin thorough search... 
## Finished. Summarizing results.

fitOUM.smp

## 
## Fit
##        lnL      AIC     AICc      BIC model ntax
##  -44.47594 96.95187 97.37292 107.3726   OUM  100
## 
## 
## Rates
##                  b         a
## alpha    0.3007500 0.3007500
## sigma.sq 0.1198701 0.1198701
## 
## Optima
##                   b          a
## estimate 1.75307314 -0.5184579
## se       0.09854245  0.1132707
## 
## 
## Half life (another way of reporting alpha)
##        b        a 
## 2.304729 2.304729 
## 
## Arrived at a reliable solution

Cool.

Now, if we look closely at our results, we should see that our parameter estimates do in fact match up fairly well, keeping in mind, of course, that these are not the same models – the OUwie model is based on fixed regimes, while our fitmultiOU function implements a joint discrete trait & continuous character evolutionary model.

I would expect this similarity to hold whenever our discrete character history is pretty unambiguous, usually because the character changes quite infrequently on the tree, but to diminish as the rate or number of changes of our discrete trait increases.

That's all folks!

Tuesday, July 14, 2026

Visualizing a multi-regime OU process on the tree using phytools

In a recent tweet about phytools’ new discrete character dependent multi-optimum OU model, I included an illustration of this process that was not derived from the original blog post.

More on a discrete character dependent multi-optimum OU model in #phytools: https://t.co/LpLHskJoo6. pic.twitter.com/JrmAlmQpbh
— Liam Revell (@phytools_liam) July 13, 2026

Just in case it might become useful later, I decided to post the code for that cool illustration here. In contrast to my prior use, here I decided to set different values for \(\alpha\) & \(\sigma^2\), as well as for \(\theta\), between the three regimes.

## load packages
library(phytools)

In this first chunk I’m going to generate the data and objects I need for the plot.

## simulate a tree
tree<-pbtree(n=100,scale=10)
## add singleton (unbranching) nodes
tt<-map.to.singleton(make.era.map(tree,
  limits=seq(0,10,length.out=1001)))
## set the transition matrix for the discrete
## trait
q<-0.2
Q<-matrix(c(
  -2*q,q,q,
  q,-2*q,q,
  q,q,-2*q),3,3,
  dimnames=list(letters[1:3],letters[1:3]))
## generate a character history of the discrete
## trait
s.tt<-sim.history(tt,Q,anc="a")

## Done simulation(s).

## set the parameters of the multi-regime OU
## process
theta<-setNames(c(-0.5,0.9,2.2),letters[1:3])
sigsq<-setNames(c(0.12,0.05,0.08),letters[1:3])
alpha<-setNames(c(0.7,0.4,0.8),letters[1:3])
## generate data for tips (and nodes) under the
## multi-regime OU process
x<-multiOU(s.tt,alpha=alpha,sig2=sigsq,
  theta=theta,a0=theta["a"],internal=TRUE)

Now for our plot.

## set colors
cols<-setNames(hcl.colors(n=3),letters[1:3])
## set plot layout
layout(matrix(c(1,2),2,1),heights=c(0.4,0.6))
## plot discrete character history on the tree
plot(s.tt,cols,ftype="off",mar=c(1.1,4.1,2.1,2.1),
  xlim=c(0,11))
mtext("a)",adj=0,line=0)
## set margins for second subplot
par(mar=c(5.1,4.1,2.1,2.1))
## plot continuous character history
phenogram(s.tt,x,ftype="off",
  colors=make.transparent(cols,0.5),
  xlim=c(0,11),cex.axis=0.8,las=1)
mtext("b)",adj=0,line=1.5)
## add the predicted equilibrium distribution
## at the tips
for(i in 1:length(theta)){
  stat_theta<-dnorm(seq(min(x),max(x),
    length.out=200),mean=theta[i],
    sd=sqrt(sigsq[i]/(2*alpha[i])))
  stat.norm<-stat_theta/max(stat_theta)
  polygon(x=c(0,stat.norm,0)+10,
    y=c(min(x),seq(min(x),max(x),
      length.out=200),max(x)),border=FALSE,
    col=make.transparent(cols[i],0.5))
}

plot of chunk unnamed-chunk-4

Nothing to it!

Monday, July 13, 2026

More on a discrete character dependent multi-optimum OU model in phytools

Good morning blog readers.

Recently I’ve posted about a new discrete character dependent multi- \(\theta\) (i.e., multi-optimum) OU model in phytools (e.g., 1, 2, 3).

Since this is very new it should only be used with the utmost caution. Nonetheless, I thought I’d post up a quick demo of how it works, and how to do a model comparison to a simpler model of joint discrete & continuous trait evolution but without dependence. (We also might consider an alternative null model with hidden characters, but this will have to be covered in a future post!)

This will only work on recent (at the time of writing) versions of phytools, so we can start by loading the package & checking which version we have.

library(phytools)

## Loading required package: ape

## Loading required package: maps

packageVersion("phytools")

## [1] '2.6.3'

To fit this model I’m going to need some data. With none readily at hand, I’m going to use phytools to simulate some.

We can start with a tree:

N<-200 ## number of taxa
phy<-pbtree(n=N,scale=10)
phy

## 
## Phylogenetic tree with 200 tips and 199 internal nodes.
## 
## Tip labels:
##   t6, t7, t44, t57, t136, t177, ...
## 
## Rooted; includes branch length(s).

Next, we want a generating discrete character history for our multi-regime OU process. Note that though we use this regime history for simulation, in an empirical case it would’ve been unknown, so shall naturally be set aside when we move forward to estimation.

## set the transition matrix of our
## discrete trait
q<-0.2
Q<-matrix(c(
  -2*q,q,q,
  q,-2*q,q,
  q,q,-2*q),3,3,
  dimnames=list(letters[1:3],letters[1:3]))
Q

##      a    b    c
## a -0.4  0.2  0.2
## b  0.2 -0.4  0.2
## c  0.2  0.2 -0.4

k<-nrow(Q) ## trait levels
k

## [1] 3

sim_tree<-sim.history(phy,Q,anc="a")

## Done simulation(s).

sim_tree

## 
## Phylogenetic tree with 200 tips and 199 internal nodes.
## 
## Tip labels:
## 	t6, t7, t44, t57, t136, t177, ...
## 
## The tree includes a mapped, 3-state discrete character
## with states:
## 	a, b, c
## 
## Rooted; includes branch lengths.

Let’s plot our generating tree as follows.

cols<-setNames(hcl.colors(n=3),letters[1:k])
plot(sim_tree,cols,ftype="off",lwd=1,
  direction="upwards")
par(lend=1)
legend("bottomleft",letters[1:3],lwd=3,
  col=hcl.colors(n=k),cex=0.8,bty="n")

plot of chunk unnamed-chunk-9

Next, we can set the generating conditions for our continous trait simulation. Our model allows for multiple \(\theta\) by discrete character state, but assumes constant \(\alpha\) and \(\sigma^2\) across the \(k\) levels of our discrete trait, so let’s simulate that.

alpha<-setNames(rep(0.3,k),letters[1:k])
alpha

##   a   b   c 
## 0.3 0.3 0.3

sig2<-setNames(rep(0.1,k),letters[1:k])
sig2

##   a   b   c 
## 0.1 0.1 0.1

theta<-setNames(c(-0.5,1,2),letters[1:k])
theta

##    a    b    c 
## -0.5  1.0  2.0

Now we’re nearly ready to simulate our continuous trait. To do that, I’ll use phytools::multiOU as I have in prior posts.

x<-multiOU(sim_tree,alpha,sig2,theta,a0=0)
head(x)

##        t6        t7       t44       t57      t136      t177 
## 1.9418267 0.5788125 1.2039455 0.7955580 0.6064443 0.9103976

Though we’ve simulated our discrete character history already, for our analysis we’ll use just the tip states, so let’s pull those into a factor vector using phytools::getStates.

y<-as.factor(getStates(sim_tree,"tips"))
head(y)

##   t6   t7  t44  t57 t136 t177 
##    a    b    c    a    c    c 
## Levels: a b c

Awesome. Now let’s first fit our null model using fitmultiOU.

fit_null<-fitmultiOU(phy,x,y,model="ER",levs=100,
  parallel=TRUE,ncores=10,root="mle",trace=1,
  null_model=TRUE)

## iter	theta	alpha	sigsq	q[1]	log(L)
## 0	1.0601	0.2025	0.2682	0.0177	-409.2704 
## 100	0.4702	0.0182	0.0846	0.2622	-287.2005 
## 200	0.5735	0.0150	0.0829	0.2569	-287.0941 
## 279	0.5743	0.0141	0.0829	0.2555	-287.0493 
## Done optimizing.

fit_null

## Object of class "fitmultiOU" based on
##     a discretization with k = 100 levels.
## 
## Fitted multi-theta OU model parameters:
##  levels: [ a, b, c ]
##   theta: [ 0.5743 ]
##   alpha: 0.0141 
##   sigsq: 0.0829 
## 
## Estimated Q matrix:
##            a          b          c
## a -0.5109335  0.2554668  0.2554668
## b  0.2554668 -0.5109335  0.2554668
## c  0.2554668  0.2554668 -0.5109335
## 
## Log-likelihood: -287.0493 
## 
## R thinks it has found the ML solution.

Let’s confirm that our parameter estimates and log-likelihood match what we would’ve obtained using a geiger::fitContinuous and phytools::fitMk. This isn’t hard.

ou_fit<-geiger::fitContinuous(phy,x,model="OU")
ou_fit

## GEIGER-fitted comparative model of continuous data
##  fitted 'OU' model parameters:
## 	alpha = 0.012919
## 	sigsq = 0.081833
## 	z0 = 0.655514
## 
##  model summary:
## 	log-likelihood = -89.743851
## 	AIC = 185.487702
## 	AICc = 185.610151
## 	free parameters = 3
## 
## Convergence diagnostics:
## 	optimization iterations = 100
## 	failed iterations = 0
## 	number of iterations with same best fit = 50
## 	frequency of best fit = 0.500
## 
##  object summary:
## 	'lik' -- likelihood function
## 	'bnd' -- bounds for likelihood search
## 	'res' -- optimization iteration summary
## 	'opt' -- maximum likelihood parameter estimates

mk_fit<-fitMk(phy,y,model="ER",pi="equal")
mk_fit

## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##           a         b         c
## a -0.511178  0.255589  0.255589
## b  0.255589 -0.511178  0.255589
## c  0.255589  0.255589 -0.511178
## 
## Fitted (or set) value of pi:
##        a        b        c 
## 0.333333 0.333333 0.333333 
## due to treating the root prior as (a) flat.
## 
## Log-likelihood: -195.760975 
## 
## Optimization method used was "nlminb"
## 
## R thinks it has found the ML solution.

null_logL<-logLik(ou_fit)+logLik(mk_fit)
null_logL

## [1] -285.5048
## attr(,"df")
## [1] 3

This should be very close to the values we obtained in fit_null. In fact, the two values are a bit farther apart than I'm comfortable with, but would undoubtedly converge if we were to increase `levs`. We should do this with some caution, though, because it very substantially is going to increase our run time.

Finally, we can fit our discrete character dependent multi- \(\theta\) OU model!

fit_mou<-fitmultiOU(phy,x,y,model="ER",levs=20,
  parallel=TRUE,ncores=10,root="mle",trace=1,
  maxit=2000)

## iter	the[a]	the[b]	the[c]	alpha	sigsq	q[1]	log(L)
## 0	-0.3575	0.3680	1.8215	0.2403	0.0115	0.2066	-309.6850 
## 100	-0.3023	1.0148	1.6594	0.2779	0.0615	0.1947	-284.8126 
## 200	-0.2191	1.1288	1.9057	0.2837	0.0611	0.1861	-284.1910 
## 300	-0.2140	1.1372	1.9328	0.2794	0.0554	0.1914	-284.0423 
## 400	-0.2355	1.1808	1.9946	0.2713	0.0499	0.1976	-283.9893 
## 500	-0.2229	1.1901	2.0192	0.2716	0.0514	0.2043	-283.9538 
## 600	-0.2202	1.2210	1.9799	0.2748	0.0519	0.2054	-283.9183 
## 700	-0.2172	1.2083	1.9723	0.2747	0.0528	0.2056	-283.9137 
## 800	-0.2164	1.2088	1.9738	0.2751	0.0528	0.2053	-283.9135 
## 900	-0.2156	1.2094	1.9724	0.2753	0.0527	0.2052	-283.9134 
## 913	-0.2157	1.2097	1.9725	0.2752	0.0527	0.2053	-283.9134 
## Done optimizing.

Just because I know that optimization of this model is difficult, I'm a bit suspicious we may not have converged on the true MLE. Let’s try again, but with “sensible” starting values for all our different model parameters.

init<-setNames(
  c(
    mean(x[y==levels(y)[1]]),
    mean(x[y==levels(y)[2]]),
    mean(x[y==levels(y)[3]]),
    log(2)/max(nodeHeights(phy)),
    var(x)/max(nodeHeights(phy)),
    fitMk(phy,y,model="ER")$rates),
  c("theta[a]","theta[b]","theta[c]",
    "alpha","sigsq","q[1]"))
init

##   theta[a]   theta[b]   theta[c]      alpha      sigsq       q[1] 
## 0.23239962 0.64856597 0.93721136 0.06931472 0.04357107 0.25558924

fit_mou<-fitmultiOU(phy,x,y,model="ER",levs=100,
  parallel=TRUE,ncores=10,root="mle",trace=1,
  maxit=2000,init=init)

## iter	the[a]	the[b]	the[c]	alpha	sigsq	q[1]	log(L)
## 0	0.2324	0.6486	0.9372	0.0693	0.0436	0.2556	-315.4681 
## 100	-0.5480	0.6211	2.3112	0.1711	0.0816	0.1765	-270.5138 
## 200	-0.9475	0.9025	2.0608	0.1907	0.0725	0.2221	-268.0952 
## 300	-0.8971	0.7990	2.0222	0.1960	0.0707	0.2176	-267.9648 
## 400	-0.8941	0.8285	2.0684	0.1935	0.0701	0.2202	-267.9573 
## 500	-0.8831	0.8318	2.0694	0.1936	0.0707	0.2166	-267.9496 
## 600	-0.8828	0.8724	2.0711	0.1903	0.0708	0.2171	-267.9394 
## 700	-0.8728	0.8737	2.0625	0.1894	0.0710	0.2152	-267.9342 
## 800	-0.8513	0.8355	1.9719	0.2003	0.0724	0.2116	-267.9037 
## 900	-0.8589	0.8295	1.9641	0.2014	0.0720	0.2139	-267.9002 
## 1000	-0.8613	0.8330	1.9623	0.2014	0.0719	0.2149	-267.8995 
## 1100	-0.8608	0.8220	1.9628	0.2017	0.0713	0.2154	-267.8972 
## 1200	-0.8600	0.8276	1.9672	0.2008	0.0712	0.2156	-267.8957 
## 1300	-0.8269	1.0429	2.0894	0.1860	0.0700	0.2134	-267.8096 
## 1400	-0.8193	1.0484	2.1319	0.1836	0.0691	0.2161	-267.7998 
## 1500	-0.8143	1.0503	2.1171	0.1844	0.0696	0.2132	-267.7953 
## 1600	-0.7951	1.0509	2.1326	0.1842	0.0690	0.2107	-267.7829 
## 1700	-0.7939	1.0513	2.1340	0.1851	0.0689	0.2095	-267.7812 
## 1723	-0.7939	1.0513	2.1335	0.1852	0.0689	0.2094	-267.7812 
## Done optimizing.

fit_mou

## Object of class "fitmultiOU" based on
##     a discretization with k = 100 levels.
## 
## Fitted multi-theta OU model parameters:
##  levels: [ a, b, c ]
##   theta: [ -0.7939, 1.0513, 2.1335 ]
##   alpha: 0.1852 
##   sigsq: 0.0689 
## 
## Estimated Q matrix:
##            a          b          c
## a -0.4188483  0.2094242  0.2094242
## b  0.2094242 -0.4188483  0.2094242
## c  0.2094242  0.2094242 -0.4188483
## 
## Log-likelihood: -267.7812 
## 
## R thinks it has found the ML solution.

Here it seems that we've definitely done much better.

Remember, the generating values of \(\theta\) were as follows:

theta

##    a    b    c 
## -0.5  1.0  2.0

Our estimates are remarkably close to the generating conditions of our simulation!

Naturally, we might be interested to know whether our discrete character dependent model better explains our continuous trait data than the independent null model. This comparison is very easy.

anova(fit_null,fit_mou)

##             log(L) d.f.      AIC       weight
## fit_null -287.0493    4 582.0985 3.166496e-08
## fit_mou  -267.7812    6 547.5624 1.000000e+00

This tells us that basically all of the weight of evidence falls on our discrete character dependent multi-regime OU model compared to the null model.

That’s all there is to it!

Preliminary, but very cool.

Saturday, July 11, 2026

Null model for discrete character dependent multi-θ Ornstein-Uhlenbeck model in phytools

A few weeks ago I posted about a discrete character dependent multi-\(\theta\) Ornstein-Uhlenbeck model using the discrete diffusion approximation of Boucher & Démery (2016) and our bioRxiv pre-print. (As in my prior post, I recommend that those interested in the technical details of this general approach check out our pre-print.)

Since that time, I’ve been corresponding with a colleague who has been trying to use the prototype fitmultiOU function. Consequently, I decided to write today’s post demonstrating how we fit the “null model” of joint continuous trait OU & discrete character Mk evolution but without discrete character dependence using phytools. I’m going to illustrate (I hope) that we get the same (to a reasonable extent of numerical precision) parameter estimates & likelihood as we would obtain from geiger::fitContinuous under a single \(\theta\) OU model and phytools::fitMk. Note that in a prior GitHub update of phytools I was inadvertently trying to separately estimate \(\theta\) and \(x_0\), the root state – but these are not identifiable under a single regime OU model, so I have set them equal to one another in phytools \(\geq\) 2.6-3.

OK. Let’s get started.

## load phytools & check package version
library(phytools)

## should be phytools >= 2.6-3
packageVersion("phytools")

## [1] '2.6.3'

For this demo I’m going to simulate a pretty small tree. This is because I’m going to set levs = 200 for estimation and this will take a while. In practice, we probably need larger trees to fit a multi-regime OU model.

## simulate tree
N<-60
phy<-pbtree(n=N,scale=10)
phy

## 
## Phylogenetic tree with 60 tips and 59 internal nodes.
## 
## Tip labels:
##   t9, t10, t53, t54, t19, t49, ...
## 
## Rooted; includes branch length(s).

Next, let’s specify a generating transition matrix of our discrete trait, Q.

## set generating Q matrix for discrete character
q<-0.2
Q<-matrix(c(
  -q,q,
  q,-q),2,2,
  dimnames=list(letters[1:2],letters[1:2]))
Q

##      a    b
## a -0.2  0.2
## b  0.2 -0.2

## get number of levels of discrete trait for simulation
k<-nrow(Q)
k

## [1] 2

We can go ahead & simulate a discrete character history of our trait. Since I’m actually going to simulate under the null, I could’ve also used sim.Mk here.

## simulate TRUE discrete character history
sim_tree<-sim.history(phy,Q,anc="a")

## Done simulation(s).

sim_tree

## 
## Phylogenetic tree with 60 tips and 59 internal nodes.
## 
## Tip labels:
## 	t9, t10, t53, t54, t19, t49, ...
## 
## The tree includes a mapped, 2-state discrete character
## with states:
## 	a, b
## 
## Rooted; includes branch lengths.

## visualize generating discrete character history
cols<-setNames(hcl.colors(n=k),letters[1:k])
plot(sim_tree,cols,ftype="off",lwd=2,
  direction="upwards")
par(lend=1)
legend("bottomleft",letters[1:k],lwd=3,
  col=hcl.colors(n=k),cex=0.8,bty="n")

plot of chunk unnamed-chunk-8

So far so good.

Next, I’m going to specify my simulation conditions of the continuous trait. Once again, since I’m actually simulating under the null model it isn’t necessary to use phytools::multiOU here, but I will anyway – just with the same values of \(\alpha\), \(\sigma^2\), and \(\theta\) for each of my discrete character levels.

## set simulation conditions for continuous trait
alpha<-setNames(rep(0.3,k),letters[1:k])
alpha

##   a   b 
## 0.3 0.3

sig2<-setNames(rep(0.1,k),letters[1:k])
sig2

##   a   b 
## 0.1 0.1

theta<-setNames(c(-0.5,-0.5),letters[1:k])
theta

##    a    b 
## -0.5 -0.5

(I’m including the next step only for people who might like to adapt this code to simulate different levels of \(\theta\) for the two different discrete character states.)

## get root state from discrete character history
root_state<-getStates(sim_tree,"nodes")[1]
root_state

##  61 
## "a"

## generating continuous character using multiOU
X<-multiOU(sim_tree,alpha,sig2,theta,
  a0=theta[root_state])
head(X)

##         t9        t10        t53        t54        t19        t49 
## -0.5425203 -0.2690402 -0.4526387 -0.2366645 -0.1525741 -0.9399830

We’ve simulated our discrete and continuous traits; however, we still need to pull our discrete trait off the tree using phytools::getStates.

## pull discrete trait off sim_tree using getStates
Y<-as.factor(getStates(sim_tree,"tips"))
head(Y)

##  t9 t10 t53 t54 t19 t49 
##   a   a   a   a   b   b 
## Levels: a b

OK. Now, to start let’s quickly fit our continuous OU model using geiger::fitContinuous, our discrete model using phytools::fitMk, and then add the log-likelihoods.

## get null log(L) using fitContinuous & fitMk
ou_fit<-geiger::fitContinuous(phy,X,model="OU")
ou_fit

## GEIGER-fitted comparative model of continuous data
##  fitted 'OU' model parameters:
## 	alpha = 0.549895
## 	sigsq = 0.152174
## 	z0 = -0.592858
## 
##  model summary:
## 	log-likelihood = -20.294564
## 	AIC = 46.589128
## 	AICc = 47.017699
## 	free parameters = 3
## 
## Convergence diagnostics:
## 	optimization iterations = 100
## 	failed iterations = 0
## 	number of iterations with same best fit = 43
## 	frequency of best fit = 0.430
## 
##  object summary:
## 	'lik' -- likelihood function
## 	'bnd' -- bounds for likelihood search
## 	'res' -- optimization iteration summary
## 	'opt' -- maximum likelihood parameter estimates

mk_fit<-fitMk(phy,Y,model="ER",pi="equal")
mk_fit

## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##           a         b
## a -0.188436  0.188436
## b  0.188436 -0.188436
## 
## Fitted (or set) value of pi:
##   a   b 
## 0.5 0.5 
## due to treating the root prior as (a) flat.
## 
## Log-likelihood: -34.29465 
## 
## Optimization method used was "nlminb"
## 
## R thinks it has found the ML solution.

(In an earlier version of this post I had set pi="mle", but then realized that pi="equal" matched our joint model, so I re-ran it.)

null_logL<-logLik(ou_fit)+logLik(mk_fit)
null_logL

## [1] -54.58921
## attr(,"df")
## [1] 3

Having done this, I’m ready to fit this same null model using fitmultiOU and null_model=TRUE. This is a joint model, but in which the discrete character has no effect on our continuous character’s evolutionary mode. Warning: this takes a while!!

## now fit the null model using fitmultiOU
fit_null<-fitmultiOU(phy,X,Y,model="ER",levs=200,
  parallel=TRUE,ncores=10,root="mle",trace=1,
  null_model=TRUE)

## iter	theta	alpha	sigsq	q[1]	log(L)
## 0	0.0991	0.0802	0.0040	0.0662	-430.5097 
## 100	-0.6013	0.3368	0.0960	0.1504	-55.8024 
## 200	-0.6002	0.5436	0.1250	0.2330	-55.1476 
## 300	-0.5919	0.5487	0.1494	0.1888	-54.5434 
## 331	-0.5912	0.5499	0.1496	0.1883	-54.5433 
## Done optimizing.

Here’s our fitted joint model.

fit_null

## Object of class "fitmultiOU" based on
##     a discretization with k = 200 levels.
## 
## Fitted multi-theta OU model parameters:
##  levels: [ a, b ]
##   theta: [ -0.5912 ]
##   alpha: 0.5499 
##   sigsq: 0.1496 
## 
## Estimated Q matrix:
##            a          b
## a -0.1883095  0.1883095
## b  0.1883095 -0.1883095
## 
## Log-likelihood: -54.5433 
## 
## R thinks it has found the ML solution.

Let’s do a quick comparison of parameter estimates. (Even though this is a pretty small tree, so we don’t expect our parameter estimates to match the generating values too closely, I’ll throw those in as well.)

## compare parameter estimates
obj<-cbind(
  setNames(c(alpha[1],sig2[1],theta[1],q),
    c("alpha","sigsq","theta","q")),
  unlist(c(ou_fit$opt[c("alpha","sigsq","z0")],
    q=mk_fit$rates)),
  unlist(list(alpha=fit_null$alpha,
    sigsq=fit_null$sigsq,
    z0=fit_null$theta,
    q=fit_null$rates)))
colnames(obj)<-c("generating","estimated","fitmultiOU")
obj

##       generating  estimated fitmultiOU
## alpha        0.3  0.5498946  0.5498956
## sigsq        0.1  0.1521741  0.1495582
## theta       -0.5 -0.5928578 -0.5912444
## q            0.2  0.1884358  0.1883095

Finally, we can compare log-likelihoods. Once again these should be similar & converge in the limit as we increase levs.

## compare log likelihoods
null_logL ## ignore the df

## [1] -54.58921
## attr(,"df")
## [1] 3

logLik(fit_null)

## [1] -54.5433
## attr(,"df")
## [1] 4

This is pretty cool. We expect both the parameter estimates and the log-likelihoods to increase in similarity with levs and some micro-experimentation (so far) confirms this.

Pages