Tuesday, May 12, 2026

Estimation of shape parameter α in our Γ distributed rate varying discrete character evolution model

A couple of years, Luke Harmon and I published a pre-print entitled “A discrete character evolution model for phylogenetic comparative biology with \(\Gamma\)-distributed rate heterogeneity among branches of the tree.”

In this paper we describe a discrete-trait evolution model where the rate of change along each edge is a random variable from a normalized \(\Gamma\) distribution with shape \(\alpha\) and scale \(\beta = 1/\alpha\).

Similar to what’s done for \(\Gamma\) distributed rates among sites in phylogeny estimation, we used a discretized \(\Gamma\) for computation – with an arbitrary number of rate categories as specified by the user.

Well, we sent this paper for review and one of the reviewers (actually, Mike May, now at Utah State) pointed out that there is no reason to use a discretized \(\Gamma\) because Huelsenbeck et al. (2008; Syst. Biol.) had already given the correct analytic formula for the probability under a continuous \(\Gamma\) – and, not only that, but using this exact formula would make calculating the likelihood much faster.

I checked and, of course, he was right!

He was so right that I implemented the update in phytools and I invited Mike to be a co-author on the manuscript… if I ever get around to resubmitting it, that is (which I intend to!).

I implemented this improvement in fitgammaMk a long time ago, but resubmitting has been very slow – largely because I’ve been distracted by other things, but also because I now need to re-do and extend all the simulations since calculations can be done so much faster than before. The manuscript needs a significant re-write to boot, since a lot of time had been dedicated towards describing the discretized \(\Gamma\) likelihood calculations!

Since I’m finally back to working on this, though, I thought it might be fun to do a little bit of exploration with regards to how this method does in estimating \(\alpha\): the \(\Gamma\) shape parameter describing variation in rate among sites. (Remember, just as in nucleotide models, low \(\alpha\) means high rate variation and the converse.)

To start let’s load some packages.

## load packages
library(phytools)
library(foreach)

Now we can set our simulation conditions. I’m going to do a total of 60 distinct simulations for different \(\alpha\) values, and (each time) repeat optimization 4 times to ensure convergence. This is a bit of a “data-hungry” method, so let’s generate phylogenies with 3,000 tips

## set simulation parameters
nsim<-60
niter<-4
ntip<-3000

We can also choose some values of \(\alpha\). I’m going to simulate log-normal values with a (geometric) mean of exp(-1).

## simulate some values of alpha (the shape parameter)
alpha<-exp(rnorm(n=nsim,mean=-1))
alpha
##  [1] 0.45564672 0.59431734 0.40165117 0.57341734 0.25593343 0.41589354 0.15507519 0.60027000 0.25560630 0.10083780
## [11] 0.17451073 0.92454862 0.77884311 0.02994018 0.01758104 0.36797724 0.24807628 0.06424652 0.60570116 0.48236885
## [21] 1.10397966 0.78076073 0.34665796 0.26065208 0.45963081 0.63876819 0.72879909 0.21312437 0.09372070 1.49190203
## [31] 1.45216268 0.57709785 0.31781245 0.41815514 0.03707839 0.09380202 0.30195431 0.39379917 0.40272691 0.50801717
## [41] 0.42020131 0.06861361 0.27846036 0.07789364 0.09257842 0.09469855 0.14644034 0.15461491 1.92831080 0.31503242
## [51] 0.07603569 0.68665905 0.51182152 0.24706044 0.12475582 0.34064239 0.21750431 0.54405206 0.18628952 0.17400053

To keep it simple, I’m going to simulate a three-state character under an "ER" equal-rates model – but, remember, this is equal-rates among different transition types: our rate varies according to edge!

## create transition matrix for simulation
Q<-matrix(c(-2,1,1,
  1,-2,1,
  1,1,-2),3,3,byrow=TRUE,
  dimnames=list(0:2,0:2))
Q
##    0  1  2
## 0 -2  1  1
## 1  1 -2  1
## 2  1  1 -2

Finally, we’re ready for our simulation. Just to speed things up a tiny bit, I’m going to use the foreach package to parallelize across simulations.

## let's make a cluster
ncores<-parallel::detectCores()-4 ## leave some cores free
mc<-parallel::makeCluster(ncores,type="PSOCK")
doParallel::registerDoParallel(cl=mc)

## do simulation
gamma_fits<-foreach(i=1:nsim)%dopar%{
  ## simulate tree
  tree<-phytools::pbtree(n=ntip,scale=1)
  ## simulate rates
  rates<-rgamma(n=nrow(tree$edge),
    alpha[i],alpha[i])
  ## simulate data
  simtree<-tree
  simtree$edge.length<-simtree$edge.length*rates
  x<-phytools::sim.Mk(simtree,Q)
  ## run niter optimization iterations
  fits<-list()
  for(j in 1:niter) fits[[j]]<-phytools::fitgammaMk(tree,
    x,model="ER",pi="fitzjohn",rand_start=TRUE,
    min.alpha=0.01,max.alpha=10)
  logL<-sapply(fits,logLik)
  fits[[which.max(logL)[1]]]
}

## stop cluster
parallel::stopCluster(mc)

Our object gamma_fits contains all the fitted models; however, at the moment, we really only care about estimation of \(\alpha\).

## pull out estimated alphas
est_alphas<-sapply(gamma_fits,function(x) x$alpha)
est_alphas
##  [1] 0.56210855 0.75941318 0.29201812 0.98101008 0.42966179 0.90396444 0.15536041 0.40271139 0.27186033 0.12907243
## [11] 0.22705690 0.36793303 0.87431957 0.03684430 0.01325773 0.36015767 0.29058093 0.05175733 0.53580828 0.62360289
## [21] 5.53593018 1.41130054 0.33375223 0.25869720 1.01964577 0.85157594 4.35533839 0.20044956 0.13797275 3.38182299
## [31] 3.12804668 0.69790623 0.45073080 1.16106427 0.04194153 0.08779118 0.32395813 0.56070330 0.22760055 0.54733300
## [41] 0.35537293 0.06851216 0.31008701 0.04693878 0.07574936 0.12493617 0.13493339 0.17895880 1.89990929 0.38123235
## [51] 0.05254867 0.51621554 0.27063902 0.45031793 0.14496688 0.32589063 0.26682441 0.36876731 0.14909626 0.15659750

Finally, let’s graph our results!

par(mar=c(5.1,4.1,1.1,1.1))
plot(alpha,est_alphas,log="xy",bty="n",las=1,cex.axis=0.8,
  pch=16,col="navy",xlab=expression(alpha),
  ylab=expression(hat(alpha)))
abline(a=0,b=1)

plot of chunk unnamed-chunk-10

Cool.

We can similarly look at the distribution of values for our estimated (mean) transition rate, which, according to our simulation, should be centered on 1.0.

q<-sapply(gamma_fits,function(x) x$rates)
par(mar=c(5.1,4.1,1.1,1.1))
hist(q,breaks=12,main="",cex.axis=0.8,las=1,
  xlab="mean transition rate (q)",ylab="frequency")

plot of chunk unnamed-chunk-11

After the fact, I had the idea of also fitting a standard Mk model to each of my simulations; however, from the look of our code, above, one might guess that this wasn’t possible because we failed to store the trees and data from each simulation. Not so! Actually, our fitted models contain the data and tree as hidden attributes.

Let’s see.

mk_fit<-fitMk(gamma_fits[[1]]$tree,
  gamma_fits[[1]]$data,model="ER",pi="fitzjohn")
mk_fit
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##           0         1         2
## 0 -1.554145  0.777073  0.777073
## 1  0.777073 -1.554145  0.777073
## 2  0.777073  0.777073 -1.554145
## 
## Fitted (or set) value of pi:
##        0        1        2 
## 0.006338 0.012031 0.981632 
## due to treating the root prior as (a) nuisance.
## 
## Log-likelihood: -1939.417191 
## 
## Optimization method used was "nlminb"
## 
## R thinks it has found the ML solution.

OK, now let’s use foreach to iterate over all the simulated trees and data in our analysis. Note that I’m going to bother to use multiple optimization iterations this time (though I probably should) just because I’m betting that optimization will not be as sticky a problem.

## let's make a cluster
ncores<-parallel::detectCores()-4 ## leave some cores free
mc<-parallel::makeCluster(ncores,type="PSOCK")
doParallel::registerDoParallel(cl=mc)

## optimize models
mk_fits<-foreach(i=1:nsim)%dopar%{
  phytools::fitMk(gamma_fits[[i]]$tree,
    gamma_fits[[i]]$data,model="ER",pi="fitzjohn",
    rand_start=TRUE)
}

## stop cluster
parallel::stopCluster(mc)

Now, with this fitted models we can begin by pulling out the transition rate. Remember, the generating mean transition rate for each dataset was \(q = 1.0\).

q<-sapply(mk_fits,function(x) x$rates)
par(mar=c(5.1,4.1,1.1,1.1))
hist(q,breaks=12,main="",cex.axis=0.8,las=1,
  xlab="mean transition rate (q)",ylab="frequency")

plot of chunk unnamed-chunk-15

We can also calculate a likelihood-ratio test statistic for each pair of fitted models for each dataset. I’ll also overlay the threshold \(\chi^2\) value for \(\alpha = 0.05\). (Note that this is our nominal type I error rate, \(\alpha\), not our shape parameter \(\alpha\)!)

logL_gamma<-sapply(gamma_fits,logLik)
logL_mk<-sapply(mk_fits,logLik)
lr<--2*(logL_mk-logL_gamma)
lr
##  [1]  7.2785984  3.7836974 12.4452257  2.1957923  6.4793898  2.1198289 24.5586953 12.8897355 11.2646909 30.7540370
## [11]  9.0550321  7.2434808  2.5675858 27.0554270 63.4935291 11.5548561 11.1888620 43.9897480  6.3408953  5.3929059
## [21]  0.1101214  1.9780238 12.1782949 14.6827249  2.2854141  2.3148659  0.1630774 15.9115957 18.5981435  0.3920628
## [31]  0.2987446  3.4728888  9.4109756  2.5150548 25.6555840 35.9834562 10.3802556  4.6250253 14.2707234  5.6280835
## [41] 10.7253016 30.9851992  7.4089753 51.8139460 40.9620438 16.0267153 26.7697882 17.5656220  0.8700424  9.2092783
## [51] 50.0572587  7.2842385 19.5178809  4.1713513 20.3367791  8.4634620 20.1460356  7.4090866 25.9385439 27.4777112
par(mar=c(5.1,4.1,1.1,1.1))
plot(alpha,lr,log="x",bty="n",las=1,
  cex.axis=0.8,pch=16,col="navy",
  xlab=expression(paste("generating ",alpha)),
  ylab="likelihood ratio")
abline(h=qchisq(p=0.95,df=1),lwd=3,col="grey",
  lty="dotted")
text(x=0.02,y=1.5*qchisq(p=0.95,df=1),
  expression(paste(alpha," = 0.05")),pos=4,
  cex=0.7)

plot of chunk unnamed-chunk-17

This shows that for basically all low values of (\(\Gamma\) shape parameter) \(\alpha\) we would reject the homogeneous rate Mk model in favor of our \(\Gamma\) model, while the opposite is true for high \(\alpha\). This, of course, makes sense because low \(\alpha\) corresponds to high rate heterogeneity among edges while high \(\alpha\) corresponds to low heterogeneity among edges. Here, for example, are the \(\Gamma\) distributions for the lowest, the geometric mean, and the highest values of \(\alpha\) used in simulation.

## function for geometric mean
gmean<-function(x,na.rm=FALSE){
  exp(mean(log(x),na.rm=na.rm))
}
cols<-make.transparent(hcl.colors(n=3),0.75)
par(mar=c(5.1,4.1,1.1,1.1))
x<-seq(0.001,5,by=0.001)
lowest<-dgamma(x,min(alpha),min(alpha))
plot(x,lowest,type="l",ylim=c(0,1.25),lwd=3,
  col=cols[1],bty="n",cex.axis=0.8,xlab="rate",
  ylab="density")
highest<-dgamma(x,max(alpha),max(alpha))
lines(x,highest,lwd=3,col=cols[3])
average<-dgamma(x,gmean(alpha),gmean(alpha))
lines(x,average,lwd=3,col=cols[2])
labs<-c(
  bquote(alpha == .(round(min(alpha),2))),
  bquote(alpha == .(round(gmean(alpha),2))),
  bquote(alpha == .(round(max(alpha),2)))
)
legend("topright",legend=labs,lwd=3,col=cols,bty="n",
  cex=0.7)

plot of chunk unnamed-chunk-18

That’s it.