Monday, June 1, 2026

Computing the likelihood of a simple OU model using the discrete approximation

With various co-authors I recently published a bioRxiv pre-print entitled “Unlocking a flexible set of phylogenetic models for discrete and continuous trait evolution using discretized stochastic diffusion”.

In this article, we describe how a powerful idea that was first introduced to phylogenetic comparative methods by Boucher & Démery (2016) – that is, exploiting the equivalence of continuous-space diffusion & discrete-space CTMC on a lattice in the limit as the number of steps in the chain goes to \(\infty\) to compute likelihoods for otherwise intractable phylogenetic models – has a wide variety of applications outside those envisioned in prior studies (Boucher & Démery 2016; Boucher et al. 2018).

All of these applications exploit the equivalence (in the limit as the interval or bin width \(\delta \rightarrow 0\)) of a nearest-neighbor CTMC and continuous stochastic diffusion where \(q_{c} = \sigma^2/(2\delta^2)\) gives the instantaneous transition rate between adjacent bins, in which \(\sigma^2\) is the stochastic diffusion (Brownian) rate and \(\delta\) is the bin width. The cool thing about this is that we compute an approximate probability density of continuous data by first discretizing our data, computing the probability under a nearest-neighbor CTMC with \(q_{c} = \sigma^2/(2\delta^2)\) via standard pruning, and then back-transforming the probability to a density by dividing by \(\delta^N\) for \(N\) tips. In our pre-print, we show how this technique can be used to fit the multi-state threshold model, a new “semi-threshold” model, a discrete-character-dependent continuous trait model, and others. Fundamentally, all rely on the principle that if a continuous vector \(\mathbf{x}\) is discretized as \(\mathbf{x'}\), given bin width \(\delta\), then the probability of \(\mathbf{x'}\) evolving as a nearest neighbor CTMC with \(q_{c} = \sigma^2/(2\delta^2)\) between neighbor, for diffusion rate of \(\mathbf{x}\) \(\sigma^2\) is \(P(\mathbf{x'}) = D(\mathbf{x}) \times \delta^N\), in which \(D(\mathbf{x})\) is the density of \(\mathbf{x}\).

To exemplify exactly this sort of calculation on a tree, let’s start by loading phytools.

library(phytools)

Then we can proceed to simulate a tree and data vector \(\mathbf{x}\) to work with as follows.

N<-12
tree<-pbtree(n=N,tip.label=LETTERS[N:1],scale=1)
x<-fastBM(tree,a=0,sig2=0.5,alpha=1,theta=0)
plotTree(tree,ylim=c(0,1.2*max(nodeHeights(tree))),
  direction="upwards",ftype="off")
tiplabels(round(x,3),srt=90,frame="n",adj=-0.25)

plot of chunk unnamed-chunk-325

To compute the exact probability (density) of these data under (say) a hypothesis of \(x_0 = 0\) and \(\sigma^2 = 0.5\) and evolution via simple Brownian motion, we start by obtaining their covariance structure as follows.

x0<-0
sigsq<-0.5
V<-sigsq*vcv(tree)
print(V,digits=4)
##        L      K      J      I      H      G      F      E      D      C      B      A
## L 0.5000 0.4168 0.4168 0.1581 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## K 0.4168 0.5000 0.4775 0.1581 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## J 0.4168 0.4775 0.5000 0.1581 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## I 0.1581 0.1581 0.1581 0.5000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## H 0.0000 0.0000 0.0000 0.0000 0.5000 0.4931 0.2216 0.2216 0.2065 0.2065 0.2065 0.2065
## G 0.0000 0.0000 0.0000 0.0000 0.4931 0.5000 0.2216 0.2216 0.2065 0.2065 0.2065 0.2065
## F 0.0000 0.0000 0.0000 0.0000 0.2216 0.2216 0.5000 0.3534 0.2065 0.2065 0.2065 0.2065
## E 0.0000 0.0000 0.0000 0.0000 0.2216 0.2216 0.3534 0.5000 0.2065 0.2065 0.2065 0.2065
## D 0.0000 0.0000 0.0000 0.0000 0.2065 0.2065 0.2065 0.2065 0.5000 0.4876 0.3087 0.3087
## C 0.0000 0.0000 0.0000 0.0000 0.2065 0.2065 0.2065 0.2065 0.4876 0.5000 0.3087 0.3087
## B 0.0000 0.0000 0.0000 0.0000 0.2065 0.2065 0.2065 0.2065 0.3087 0.3087 0.5000 0.4619
## A 0.0000 0.0000 0.0000 0.0000 0.2065 0.2065 0.2065 0.2065 0.3087 0.3087 0.4619 0.5000

Then we can evaluate the density on a log-scale using mvtnorm::dmvnorm as so.

mvtnorm::dmvnorm(x,mean=rep(x0,Ntip(tree)),
  sigma=V,log=TRUE)
## [1] -2.797184

To do the equivalent calculation assuming a nearest-neighbor CTMC, we must first select a number of links in the chain. The higher, the more accurate our approximation. Here, I’ll choose 200.

## number of levels for discretization
n<-200

Next, I set upper & lower bounds well outside of our trait range.

## upper & lower bounds
xlims<-phytools:::expand.range(x)

Then, I find all the bin midpoints given my n and xlims.

## bin midpoints
xi<-seq(xlims[1],xlims[2],length.out=n)

And we can proceed to calculate \(\delta\), which we’ll need later.

## delta
delta<-xi[2]-xi[1]

We needn’t, but I intend to use a phytools internal helper function (phytools:::to_binned) to transform my continuous trait into a binned, discretized trait. For that, we need to calculate the upper & lower bounds of each bin, rather than just their midpoints.

## actual bins
bins<-cbind(xi-delta/2,xi+delta/2)
## bin-discretized data
X<-phytools:::to_binned(x,bins)

Finally, our likelihood function for the discretized diffusion approximation. (Note that this is not super clean. I’m scoping delta, n, and xi from my Global Environment. Still, this works in R.)

## likelihood function
lik.bm<-function(sigma2,x0,X,phy){
  q.c<-sigma2/(2*delta^2)
  Q<-matrix(0,n,n)
  for(i in 2:(n-1)) Q[i,i+1]<-Q[i,i-1]<-q.c
  diag(Q)<--rowSums(Q)
  X0<-rep(0,length(xi))
  X0[which.min(abs(x0-xi))]<-1
  fit<-fitMk(phy,X,fixedQ=Q,pi=X0)
  obj<-logLik(fit)-Ntip(phy)*log(delta)
  class(obj)<-"logLik"
  attr(obj,"df")<-3
  obj
}

If we evaluate it, we should find that it gives us a highly similar value of the density to what we calculated early using dmvnorm (-2.797184, in that case – see above).

lik.bm(sigma2=0.5,x0=0,X=X,phy=tree)
## 'log Lik.' -2.796157 (df=3)

Yup. I’d say that’s close. Fiddle with n and you should find it gets even closer!

Ornstein-Uhlenbeck is a stochastic process model where \(dX_t = \sigma dW_t+\alpha(\theta - X_t)\). The first part of this differential expression (\(\sigma dW_t\)) is just standard Brownian motion, whereas the second part (\(\alpha(\theta - X_t)\)) is deterministic.

To translate this to our nearest-neighbor CTMC, therefore, we will need a different upward & downward rate for our process, depending on which side of \(\theta\) we find ourselves. Specifically, if we’re below \(\theta\), then \(q_+ = \sigma^2/(2\delta^2) + \alpha(\theta-x)/\delta\) and \(q_- = \sigma^2/(2\delta^2)\), whereas if we’re above \(\theta\) then \(q_+ = \sigma^2/(2\delta^2)\) and \(q_- = \sigma^2/(2\delta^2)-\alpha(\theta-x)/\delta\), where \(q_+\) and \(q_-\) are the upwards & downwards rates in our nearest-neighbor CTMC, respectively. If this isn’t intuitive, remember that OU is a rubberband model, and so if we’re below \(\theta\) then our “energy” upwards is just the sum of the stochastic component and the rubber band, while the energy downwards is just the stochastic part of our model; whereas, whenever we’re above \(\theta\) precisely the opposite is true.

(Don’t worry. I’m not the one that figured out these exact maths! It’s in the physical sciences literature already.)

OK, so let’s try to set it up, much as we did before.

lik.ou<-function(theta,alpha,sigma2,X,phy){
  mu<-alpha*(theta-xi)
  q.up<-sigma2/(2*delta^2)+pmax(mu,0)/delta
  q.down<-sigma2/(2*delta^2)+pmax(-mu,0)/delta
  Q<-matrix(0,n,n)
  for(i in 2:(n-1)){
    Q[i,i+1]<-q.up[i]
    Q[i,i-1]<-q.down[i]
  }
  diag(Q)<--rowSums(Q)
  X0<-rep(0,length(xi))
  X0[which.min(abs(theta-xi))]<-1
  fit<-fitMk(phy,X,fixedQ=Q,pi=X0)
  obj<-logLik(fit)-Ntip(phy)*log(delta)
  class(obj)<-"logLik"
  attr(obj,"df")<-3
  obj
}

Now, we happen to know that these data were actually simulated under an OU model with \(\sigma^2 = 0.5\), \(\theta = 0\), and \(\alpha = 1.0\). Let’s plug these values into our CTMC likelihood function and see what we get.

lik.ou(theta=0,alpha=1,sigma2=0.5,X,tree)
## 'log Lik.' -1.48367 (df=3)

OK, so how close is this to the true density? Well, much like BM, OU is a Gaussian process and the tip values will thus be MVN. We just need a way to get the correct covariance matrix for a particular \(\sigma^2\) and \(\alpha\), which is straightforward using phytools::vcvPhylo as follows.

x0<-0
sigsq<-0.5
V<-sigsq*vcvPhylo(tree,model="OU",alpha=1,
  anc.nodes=FALSE)
print(V,digits=4)
##         L       K       J       I       H       G       F       E       D       C       B       A
## L 0.21617 0.14539 0.14539 0.02985 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
## K 0.14539 0.21617 0.19466 0.02985 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
## J 0.14539 0.19466 0.21617 0.02985 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
## I 0.02985 0.02985 0.02985 0.21617 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
## H 0.00000 0.00000 0.00000 0.00000 0.21617 0.20931 0.04827 0.04827 0.04345 0.04345 0.04345 0.04345
## G 0.00000 0.00000 0.00000 0.00000 0.20931 0.21617 0.04827 0.04827 0.04345 0.04345 0.04345 0.04345
## F 0.00000 0.00000 0.00000 0.00000 0.04827 0.04827 0.21617 0.10527 0.04345 0.04345 0.04345 0.04345
## E 0.00000 0.00000 0.00000 0.00000 0.04827 0.04827 0.10527 0.21617 0.04345 0.04345 0.04345 0.04345
## D 0.00000 0.00000 0.00000 0.00000 0.04345 0.04345 0.04345 0.04345 0.21617 0.20408 0.08246 0.08246
## C 0.00000 0.00000 0.00000 0.00000 0.04345 0.04345 0.04345 0.04345 0.20408 0.21617 0.08246 0.08246
## B 0.00000 0.00000 0.00000 0.00000 0.04345 0.04345 0.04345 0.04345 0.08246 0.08246 0.21617 0.18080
## A 0.00000 0.00000 0.00000 0.00000 0.04345 0.04345 0.04345 0.04345 0.08246 0.08246 0.18080 0.21617

Ready?

mvtnorm::dmvnorm(x,mean=rep(x0,Ntip(tree)),
  sigma=V,log=TRUE)
## [1] -1.489203

OK, that looks pretty darn similar!

Now, as a last step, & just for fun, let’s contemplate what it might look like to optimize our OU model using the discretized approximation, rather than just feeding it our generating parameter values.

First, let’s optimize in our original space to maximize the probability of obtaining the data that we have, in fact, in our hands (that is, maximize the likelihood).

## do this on a log scale
likc<-function(par,x,phy){
  -mvtnorm::dmvnorm(x,mean=rep(par[1],Ntip(phy)),
    sigma=exp(par[3])*vcvPhylo(phy,model="OU",
      alpha=exp(par[2]),anc.nodes=FALSE),
    log=TRUE)
}
## run through optim
fit<-optim(
  par=c(0,log(1),log(0.5)),
  fn=likc,
  x=x,phy=tree)
## likelihood
-fit$value
## [1] -1.019458
## parameter estimates
c(fit$par[1],exp(fit$par[2:3]))
## [1] -0.07922464  1.54252848  0.41766567

These should be similar to what we might’ve obtained using geiger::fitContinuous, of course. (Thankfully, they are.)

ou_fit<-geiger::fitContinuous(tree,x,model="OU")
ou_fit
## GEIGER-fitted comparative model of continuous data
##  fitted 'OU' model parameters:
## 	alpha = 1.542565
## 	sigsq = 0.417675
## 	z0 = -0.079225
## 
##  model summary:
## 	log-likelihood = -1.019458
## 	AIC = 8.038916
## 	AICc = 11.038916
## 	free parameters = 3
## 
## Convergence diagnostics:
## 	optimization iterations = 100
## 	failed iterations = 0
## 	number of iterations with same best fit = 60
## 	frequency of best fit = 0.600
## 
##  object summary:
## 	'lik' -- likelihood function
## 	'bnd' -- bounds for likelihood search
## 	'res' -- optimization iteration summary
## 	'opt' -- maximum likelihood parameter estimates

Now, let’s try to do the same thing using the discrete approximation.

likd<-function(par,X,phy){
  -as.numeric(lik.ou(par[1],exp(par[2]),exp(par[3]),
    X,phy))
}
fit<-optim(
  par=c(0,log(1),log(0.5)),
  fn=likd,
  X=X,
  phy=tree,
  method="Nelder-Mead")
## likelihood
-fit$value
## [1] -0.9733699
## parameter estimates
c(fit$par[1],exp(fit$par[2:3]))
## [1] -0.06975488  1.51608457  0.40323807

Yeah. Keeping in mind that this is an approximation that only becomes exact in the limit as \(\delta\) goes to zero, that totally worked.

Tuesday, May 12, 2026

Estimation of shape parameter α in our Γ distributed rate varying discrete character evolution model

A couple of years ago, Luke Harmon and I published a pre-print entitled “A discrete character evolution model for phylogenetic comparative biology with \(\Gamma\)-distributed rate heterogeneity among branches of the tree.”

In this paper we describe a discrete-trait evolution model where the rate of change along each edge is a random variable from a normalized \(\Gamma\) distribution with shape \(\alpha\) and scale \(\beta = 1/\alpha\).

Similar to what’s done for \(\Gamma\) distributed rates among sites in phylogeny estimation, we used a discretized \(\Gamma\) for computation – with an arbitrary number of rate categories as specified by the user.

Well, we sent this paper for review and one of the reviewers (actually, Mike May, now at Utah State) pointed out that there is no reason to use a discretized \(\Gamma\) because Huelsenbeck et al. (2008; Syst. Biol.) had already given the correct analytic formula for the probability under a continuous \(\Gamma\) – and, not only that, but using this exact formula would make calculating the likelihood much faster.

I checked and, of course, he was right!

He was so right that I implemented the update in phytools and I invited Mike to be a co-author on the manuscript… if I ever get around to resubmitting it, that is (which I intend to!).

I implemented this improvement in fitgammaMk a long time ago, but resubmitting has been very slow – largely because I’ve been distracted by other things, but also because I now need to re-do and extend all the simulations since calculations can be done so much faster than before. The manuscript needs a significant re-write to boot, since a lot of time had been dedicated towards describing the discretized \(\Gamma\) likelihood calculations!

Since I’m finally back to working on this, though, I thought it might be fun to do a little bit of exploration with regards to how this method does in estimating \(\alpha\): the \(\Gamma\) shape parameter describing variation in rate among sites. (Remember, just as in nucleotide models, low \(\alpha\) means high rate variation and the converse.)

To start let’s load some packages.

## load packages
library(phytools)
library(foreach)

Now we can set our simulation conditions. I’m going to do a total of 60 distinct simulations for different \(\alpha\) values, and (each time) repeat optimization 4 times to ensure convergence. This is a bit of a “data-hungry” method, so let’s generate phylogenies with 3,000 tips

## set simulation parameters
nsim<-60
niter<-4
ntip<-3000

We can also choose some values of \(\alpha\). I’m going to simulate log-normal values with a (geometric) mean of exp(-1).

## simulate some values of alpha (the shape parameter)
alpha<-exp(rnorm(n=nsim,mean=-1))
alpha
##  [1] 0.45564672 0.59431734 0.40165117 0.57341734 0.25593343 0.41589354 0.15507519 0.60027000 0.25560630 0.10083780
## [11] 0.17451073 0.92454862 0.77884311 0.02994018 0.01758104 0.36797724 0.24807628 0.06424652 0.60570116 0.48236885
## [21] 1.10397966 0.78076073 0.34665796 0.26065208 0.45963081 0.63876819 0.72879909 0.21312437 0.09372070 1.49190203
## [31] 1.45216268 0.57709785 0.31781245 0.41815514 0.03707839 0.09380202 0.30195431 0.39379917 0.40272691 0.50801717
## [41] 0.42020131 0.06861361 0.27846036 0.07789364 0.09257842 0.09469855 0.14644034 0.15461491 1.92831080 0.31503242
## [51] 0.07603569 0.68665905 0.51182152 0.24706044 0.12475582 0.34064239 0.21750431 0.54405206 0.18628952 0.17400053

To keep it simple, I’m going to simulate a three-state character under an "ER" equal-rates model – but, remember, this is equal-rates among different transition types: our rate varies according to edge!

## create transition matrix for simulation
Q<-matrix(c(-2,1,1,
  1,-2,1,
  1,1,-2),3,3,byrow=TRUE,
  dimnames=list(0:2,0:2))
Q
##    0  1  2
## 0 -2  1  1
## 1  1 -2  1
## 2  1  1 -2

Finally, we’re ready for our simulation. Just to speed things up a tiny bit, I’m going to use the foreach package to parallelize across simulations.

## let's make a cluster
ncores<-parallel::detectCores()-4 ## leave some cores free
mc<-parallel::makeCluster(ncores,type="PSOCK")
doParallel::registerDoParallel(cl=mc)

## do simulation
gamma_fits<-foreach(i=1:nsim)%dopar%{
  ## simulate tree
  tree<-phytools::pbtree(n=ntip,scale=1)
  ## simulate rates
  rates<-rgamma(n=nrow(tree$edge),
    alpha[i],alpha[i])
  ## simulate data
  simtree<-tree
  simtree$edge.length<-simtree$edge.length*rates
  x<-phytools::sim.Mk(simtree,Q)
  ## run niter optimization iterations
  fits<-list()
  for(j in 1:niter) fits[[j]]<-phytools::fitgammaMk(tree,
    x,model="ER",pi="fitzjohn",rand_start=TRUE,
    min.alpha=0.01,max.alpha=10)
  logL<-sapply(fits,logLik)
  fits[[which.max(logL)[1]]]
}

## stop cluster
parallel::stopCluster(mc)

Our object gamma_fits contains all the fitted models; however, at the moment, we really only care about estimation of \(\alpha\).

## pull out estimated alphas
est_alphas<-sapply(gamma_fits,function(x) x$alpha)
est_alphas
##  [1] 0.56210855 0.75941318 0.29201812 0.98101008 0.42966179 0.90396444 0.15536041 0.40271139 0.27186033 0.12907243
## [11] 0.22705690 0.36793303 0.87431957 0.03684430 0.01325773 0.36015767 0.29058093 0.05175733 0.53580828 0.62360289
## [21] 5.53593018 1.41130054 0.33375223 0.25869720 1.01964577 0.85157594 4.35533839 0.20044956 0.13797275 3.38182299
## [31] 3.12804668 0.69790623 0.45073080 1.16106427 0.04194153 0.08779118 0.32395813 0.56070330 0.22760055 0.54733300
## [41] 0.35537293 0.06851216 0.31008701 0.04693878 0.07574936 0.12493617 0.13493339 0.17895880 1.89990929 0.38123235
## [51] 0.05254867 0.51621554 0.27063902 0.45031793 0.14496688 0.32589063 0.26682441 0.36876731 0.14909626 0.15659750

Finally, let’s graph our results!

par(mar=c(5.1,4.1,1.1,1.1))
plot(alpha,est_alphas,log="xy",bty="n",las=1,cex.axis=0.8,
  pch=16,col="navy",xlab=expression(alpha),
  ylab=expression(hat(alpha)))
abline(a=0,b=1)

plot of chunk unnamed-chunk-10

Cool.

We can similarly look at the distribution of values for our estimated (mean) transition rate, which, according to our simulation, should be centered on 1.0.

q<-sapply(gamma_fits,function(x) x$rates)
par(mar=c(5.1,4.1,1.1,1.1))
hist(q,breaks=12,main="",cex.axis=0.8,las=1,
  xlab="mean transition rate (q)",ylab="frequency")

plot of chunk unnamed-chunk-11

After the fact, I had the idea of also fitting a standard Mk model to each of my simulations; however, from the look of our code, above, one might guess that this wasn’t possible because we failed to store the trees and data from each simulation. Not so! Actually, our fitted models contain the data and tree as hidden attributes.

Let’s see.

mk_fit<-fitMk(gamma_fits[[1]]$tree,
  gamma_fits[[1]]$data,model="ER",pi="fitzjohn")
mk_fit
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##           0         1         2
## 0 -1.554145  0.777073  0.777073
## 1  0.777073 -1.554145  0.777073
## 2  0.777073  0.777073 -1.554145
## 
## Fitted (or set) value of pi:
##        0        1        2 
## 0.006338 0.012031 0.981632 
## due to treating the root prior as (a) nuisance.
## 
## Log-likelihood: -1939.417191 
## 
## Optimization method used was "nlminb"
## 
## R thinks it has found the ML solution.

OK, now let’s use foreach to iterate over all the simulated trees and data in our analysis. Note that I’m going to bother to use multiple optimization iterations this time (though I probably should) just because I’m betting that optimization will not be as sticky a problem.

## let's make a cluster
ncores<-parallel::detectCores()-4 ## leave some cores free
mc<-parallel::makeCluster(ncores,type="PSOCK")
doParallel::registerDoParallel(cl=mc)

## optimize models
mk_fits<-foreach(i=1:nsim)%dopar%{
  phytools::fitMk(gamma_fits[[i]]$tree,
    gamma_fits[[i]]$data,model="ER",pi="fitzjohn",
    rand_start=TRUE)
}

## stop cluster
parallel::stopCluster(mc)

Now, with this fitted models we can begin by pulling out the transition rate. Remember, the generating mean transition rate for each dataset was \(q = 1.0\).

q<-sapply(mk_fits,function(x) x$rates)
par(mar=c(5.1,4.1,1.1,1.1))
hist(q,breaks=12,main="",cex.axis=0.8,las=1,
  xlab="mean transition rate (q)",ylab="frequency")

plot of chunk unnamed-chunk-15

We can also calculate a likelihood-ratio test statistic for each pair of fitted models for each dataset. I’ll also overlay the threshold \(\chi^2\) value for \(\alpha = 0.05\). (Note that this is our nominal type I error rate, \(\alpha\), not our shape parameter \(\alpha\)!)

logL_gamma<-sapply(gamma_fits,logLik)
logL_mk<-sapply(mk_fits,logLik)
lr<--2*(logL_mk-logL_gamma)
lr
##  [1]  7.2785984  3.7836974 12.4452257  2.1957923  6.4793898  2.1198289 24.5586953 12.8897355 11.2646909 30.7540370
## [11]  9.0550321  7.2434808  2.5675858 27.0554270 63.4935291 11.5548561 11.1888620 43.9897480  6.3408953  5.3929059
## [21]  0.1101214  1.9780238 12.1782949 14.6827249  2.2854141  2.3148659  0.1630774 15.9115957 18.5981435  0.3920628
## [31]  0.2987446  3.4728888  9.4109756  2.5150548 25.6555840 35.9834562 10.3802556  4.6250253 14.2707234  5.6280835
## [41] 10.7253016 30.9851992  7.4089753 51.8139460 40.9620438 16.0267153 26.7697882 17.5656220  0.8700424  9.2092783
## [51] 50.0572587  7.2842385 19.5178809  4.1713513 20.3367791  8.4634620 20.1460356  7.4090866 25.9385439 27.4777112
par(mar=c(5.1,4.1,1.1,1.1))
plot(alpha,lr,log="x",bty="n",las=1,
  cex.axis=0.8,pch=16,col="navy",
  xlab=expression(paste("generating ",alpha)),
  ylab="likelihood ratio")
abline(h=qchisq(p=0.95,df=1),lwd=3,col="grey",
  lty="dotted")
text(x=0.02,y=1.5*qchisq(p=0.95,df=1),
  expression(paste(alpha," = 0.05")),pos=4,
  cex=0.7)

plot of chunk unnamed-chunk-17

This shows that for basically all low values of (\(\Gamma\) shape parameter) \(\alpha\) we would reject the homogeneous rate Mk model in favor of our \(\Gamma\) model, while the opposite is true for high \(\alpha\). This, of course, makes sense because low \(\alpha\) corresponds to high rate heterogeneity among edges while high \(\alpha\) corresponds to low heterogeneity among edges. Here, for example, are the \(\Gamma\) distributions for the lowest, the geometric mean, and the highest values of \(\alpha\) used in simulation.

## function for geometric mean
gmean<-function(x,na.rm=FALSE){
  exp(mean(log(x),na.rm=na.rm))
}
cols<-make.transparent(hcl.colors(n=3),0.75)
par(mar=c(5.1,4.1,1.1,1.1))
x<-seq(0.001,5,by=0.001)
lowest<-dgamma(x,min(alpha),min(alpha))
plot(x,lowest,type="l",ylim=c(0,1.25),lwd=3,
  col=cols[1],bty="n",cex.axis=0.8,xlab="rate",
  ylab="density")
highest<-dgamma(x,max(alpha),max(alpha))
lines(x,highest,lwd=3,col=cols[3])
average<-dgamma(x,gmean(alpha),gmean(alpha))
lines(x,average,lwd=3,col=cols[2])
labs<-c(
  bquote(alpha == .(round(min(alpha),2))),
  bquote(alpha == .(round(gmean(alpha),2))),
  bquote(alpha == .(round(max(alpha),2)))
)
legend("topright",legend=labs,lwd=3,col=cols,bty="n",
  cex=0.7)

plot of chunk unnamed-chunk-18

That’s it.

Wednesday, January 28, 2026

Simple new S3 plot method for the "fastAnc" object class

I just committed a new update to phytools that involved adding a new generic plot method for the "fastAnc" object class. This was pretty simple, with the main trick to it being that I modified the "fastAnc" object just by adding the tree and input data from fastAnc as object attributes.

Here’s a simple demo.

## load package
library(phytools)
## check version number
packageVersion("phytools")
## [1] '2.5.4'
## load tree and data
data(mammal.tree)
data(mammal.data)
## extract log body mass
lnBodyMass<-setNames(log(mammal.data$bodyMass),
  rownames(mammal.data))
## run ASR using phytools::fastAnc
mammal.anc<-fastAnc(mammal.tree,lnBodyMass)
mammal.anc
## Ancestral character estimates using fastAnc:
##       50       51       52       53       54       55       56       57       58       59 
## 4.616864 3.928453 3.570720 3.372081 5.008271 5.416959 2.503895 1.783068 2.074242 2.092048 
##       60       61       62       63       64       65       66       67       68       69 
## 2.102201 2.427742 2.879777 2.935790 4.000602 3.660331 4.314389 4.607878 4.760248 4.846334 
##       70       71       72       73       74       75       76       77       78       79 
## 5.299692 5.545168 7.076476 5.516430 5.552013 5.123064 5.352999 5.190895 5.148764 5.146091 
##       80       81       82       83       84       85       86       87       88       89 
## 5.221873 6.105385 4.596915 3.642387 3.618018 3.586540 4.613914 4.591181 4.847749 4.777965 
##       90       91       92       93       94       95       96       97 
## 4.873585 4.882390 5.109305 4.960400 4.878856 4.928119 4.832228 4.246744
plot(mammal.anc,ftype="i",fsize=0.6,offset=0.5,
  title="log(body mass)")

plot of chunk unnamed-chunk-6

Cool.

Let’s simulate a large tree and then try it in a different style.

## simulate tree & data
sim_tree<-pbtree(n=500,scale=100)
sim_data<-fastBM(sim_tree)
## do ASR
sim_asr<-fastAnc(sim_tree,sim_data,CI=TRUE)
## print ASR
print(sim_asr,printlen=4)
## Ancestral character estimates using fastAnc:
##        501       502       503       504     
##  -1.968242 -2.989587 -4.870655 -9.828655 ....
## 
## Lower & upper 95% CIs:
##          lower     upper
## 501   -8.59862  4.662136
## 502  -8.817602  2.838427
## 503 -10.380942  0.639632
## 504 -15.847845 -3.809464
##           ....      ....
## plot results
plot(sim_asr,type="fan",ftype="off",lwd=1,color="grey",
  node.cex=0.8,tip.cex=0.5,ylim=c(-120,100),legend="bottomleft")

plot of chunk unnamed-chunk-8

That’s pretty much the idea.