Thursday, May 22, 2025

A semi-threshold trait evolution model in phytools

A few weeks ago on this blog I posted about a new “semi-threshold” model for quantitative trait evolution based on the discretized diffusion approximation of Boucher & Démery (2016).

As a reminder that repeats a little bit of what I already described in that prior post, the threshold model is a discrete trait evolution model in which the observed character is discretely-valued – but is underlain by an unseen continuous trait called “liability.” Every time this liability crosses some fixed threshold value, our discrete character changes state.

A semi-threshold model, by contrast, imagines that on some part of its range liability (the continuous trait) can be observed as a measurable quantitative character. On other parts of the range, liability is not observable but nonetheless continuous to evolve.

An example of a continuous trait that might evolve in this way could include, for instance, “proportion of animal matter in the diet” of a carnivorous animal. This trait can reach 100%, but the underlying basis of carnivory continuous to evolve away from this “threshold” in a manner that can’t be observed by measuring diet. The more evolution that transpires past the threshold, the harder it is to get back! (One might even imagine a parallel phenomenon at 0%.)

Here’s a visualization of evolution under this semi-threshold process using a new graphical simulator adapted from a different recent post on this blog. The lower panel shows evolution of the liability, while the continuous character is projected onto the tree. Note that when the threshold is reached liability continuous to evolve but the observed trait stops – that is, unless, liability happens to evolve back towards the threshold & cross it!

library(phytools)
packageVersion("phytools")
## [1] '2.4.16'
tree<-pbtree(n=200,scale=1)
z<-sim.absorbing(tree,sig2=2.2,bounds=c(-1,1),
  semithreshold=TRUE)

plot of chunk unnamed-chunk-60

Data that arise under this semi-threshold process are decidedly non-normal – tending to accumulate on the upper & lower threshold.

Let’s see.

par(mar=c(5.1,4.1,1.1,1.1))
hist(z,breaks=20,main="",las=1,cex.axis=0.8,
  xlab="phenotypic trait (z)")

plot of chunk unnamed-chunk-61

Yup.

Now let’s fit our semi-threshold model, which can now be done using the phytools function fitsemiThresh.

stmodel<-fitsemiThresh(tree,z,threshold=c(-1,1),
  levs=300)
stmodel
## Object of class "fitsemiThresh" based on
##    a discretization with k = 300 levels on the interval [ -5.036777 , 5.036777 ].
## 
## Set or estimated thresholds: [ -1 , 1 ]
## 
## Fitted model parameters:
##   sigsq: 2.276992 
##      x0: -0.15 
## 
## Log-likelihood: 229.855466 
## 
## R thinks it has found the ML solution.

Our fitted \(\sigma^2\) is pretty close to the generating value of \(\sigma^2 = 2.2\), so that’s neat.

We can compare this to a standard Brownian model for this trait, although I can already tell you how that’s going to go.

First, I’ll do it exactly using geiger::fitContinuous.

geiger::fitContinuous(tree,z)
## GEIGER-fitted comparative model of continuous data
##  fitted 'BM' model parameters:
## 	sigsq = 0.767761
## 	z0 = -0.064268
## 
##  model summary:
## 	log-likelihood = -99.067442
## 	AIC = 202.134883
## 	AICc = 202.195797
## 	free parameters = 2
## 
## Convergence diagnostics:
## 	optimization iterations = 100
## 	failed iterations = 0
## 	number of iterations with same best fit = 100
## 	frequency of best fit = 1.000
## 
##  object summary:
## 	'lik' -- likelihood function
## 	'bnd' -- bounds for likelihood search
## 	'res' -- optimization iteration summary
## 	'opt' -- maximum likelihood parameter estimates

We’ll also get the same parameter estimates and log-likelihood from fitsemThresh but setting the thresholds to threshold = c(-Inf,Inf) – inasmuch as they’ll converge in the limit as our bin witdth \(\delta \rightarrow 0\), of course!

bmmodel<-fitsemiThresh(tree,z,threshold=c(-Inf,Inf),
  levs=300)
bmmodel
## Object of class "fitsemiThresh" based on
##    a discretization with k = 300 levels on the interval [ -4.596527 , 4.596527 ].
## 
## Set or estimated thresholds: [ -4.596527 , 4.596527 ]
## 
## Fitted model parameters:
##   sigsq: 0.758371 
##      x0: -0.076609 
## 
## Log-likelihood: -99.48216 
## 
## R thinks it has found the ML solution.

That’s all I’m gonna say about this for now, folks! Good night.