Wednesday, June 4, 2025

Why parameter estimates in fitPagel can hit upper bounds, even if they don't in fitMk for each trait modeled separately....

In response to a user issue, I just wanted to quickly & simply demonstrate an interesting case in which parameter estimates under a Pagel ’94 (Pagel 1994) correlated binary character evolution model can sometimes hit the bounds even when the rates of evolution for each individual character are nowhere near the upper boundary value for optimization.

To do this, I’m going to simulated two discrete traits, x and y, where one is the perfect mirror image of the other. (They don’t need to be perfectly correlated for this to be an issue, but our two traits must be correlated.)

library(phytools)
## generating conditions
N<-200
t<-1.0
q<-0.1
## simulate a tree
tree<-pbtree(n=N,scale=t)
## our Q matrix to simulate x
Q<-q*matrix(c(-1,1,1,-1),2,2,
  dimnames=list(0:1,0:1))
## simulate x
x<-sim.Mk(tree,Q)
## create y as its mirror image
y<-setNames(as.factor(c(1:0)[as.numeric(x)]),
  names(x))

To see that we’ve accomplished what we set out to do we can create a simple visualization of our two traits as follows.

par(mfrow=c(1,2))
## plot tree 1
plotTree(tree,ftype="off",lwd=1)
## get last plotted "phylo" coordinates
pp<-get("last_plot.phylo",envir=.PlotPhyloEnv)
## set colors
cols<-hcl.colors(n=2)[as.numeric(x[tree$tip.label])]
## plot polygons at the tips
for(i in 1:N){
  polygon(pp$xx[i]+t*c(0,0.05,0.05,0),
    pp$yy[i]+c(-0.5,-0.5,0.5,0.5),
    col=cols[i],border="transparent")
}
## repeat for tree 2, but plotted backwards
plotTree(tree,ftype="off",lwd=1,xlim=pp$x.lim[2:1])
pp<-get("last_plot.phylo",envir=.PlotPhyloEnv)
cols<-hcl.colors(n=2)[as.numeric(y[tree$tip.label])]
for(i in 1:N){
  polygon(pp$xx[i]+t*c(0,0.05,0.05,0),
    pp$yy[i]+c(-0.5,-0.5,0.5,0.5),
    col=cols[i],border="transparent")
}

plot of chunk unnamed-chunk-19

(This code words if our tree is already in “cladewise” order and if our trait vectors x and y are factors. Otherwise it should be adapted with caution!)

Now let’s fit standard Mk models to x and y using fitMk as follows.

fit_x<-fitMk(tree,x,pi="fitzjohn")
fit_x
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##           0         1
## 0 -0.080597  0.080597
## 1  0.080597 -0.080597
## 
## Fitted (or set) value of pi:
##        0        1 
## 0.901183 0.098817 
## due to treating the root prior as (a) nuisance.
## 
## Log-likelihood: -23.150054 
## 
## Optimization method used was "nlminb"
## 
## R thinks it has found the ML solution.
fit_y<-fitMk(tree,y,pi="fitzjohn")
fit_y
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##           0         1
## 0 -0.080597  0.080597
## 1  0.080597 -0.080597
## 
## Fitted (or set) value of pi:
##        0        1 
## 0.098817 0.901183 
## due to treating the root prior as (a) nuisance.
## 
## Log-likelihood: -23.150054 
## 
## Optimization method used was "nlminb"
## 
## R thinks it has found the ML solution.

Nothing seems remiss so far. Let’s try to fit our Pagel ’94 model using phytools::fitPagel.

fit_xy<-fitPagel(tree,x,y,pi="fitzjohn")
fit_xy
## 
## Pagel's binary character correlation test:
## 
## Assumes "ARD" substitution model for both characters
## 
## Independent model rate matrix:
##           0|0       0|1       1|0       1|1
## 0|0 -0.151827  0.045559  0.106268  0.000000
## 0|1  0.106268 -0.212536  0.000000  0.106268
## 1|0  0.045558  0.000000 -0.091117  0.045559
## 1|1  0.000000  0.045558  0.106268 -0.151826
## 
## Dependent (x & y) model rate matrix:
##             0|0        0|1        1|0       1|1
## 0|0 -200.000000 100.000000 100.000000   0.00000
## 0|1    0.134109  -0.134109   0.000000   0.00000
## 1|0    0.090466   0.000000  -0.090466   0.00000
## 1|1    0.000000   0.000000  80.227651 -80.22765
## 
## Model fit:
##             log-likelihood      AIC
## independent      -45.66622 99.33243
## dependent        -20.15903 56.31807
## 
## Hypothesis test result:
##   likelihood-ratio:  51.0144 
##   p-value:  2.21685e-10 
## 
## Model fitting method used was fitMk 
## 
## R thinks both model optimizations converged.

Here R think that optimization has converged, but the perfectly round values for the transitions 0|0 \(\rightarrow\) 0|1 and 0|0 \(\rightarrow\) 1|0 should give us pause. Indeed, the default value of max.q (the maximum value an element of Q can assume in optimization) is \(100 \times\) the total tree height – which in our case we set to 1.0!

Let’s try again, but adjusting max.q.

fit_xy<-fitPagel(tree,x,y,pi="fitzjohn",max.q=1e4,
  logscale=TRUE)
fit_xy
## 
## Pagel's binary character correlation test:
## 
## Assumes "ARD" substitution model for both characters
## 
## Independent model rate matrix:
##           0|0       0|1       1|0       1|1
## 0|0 -0.151826  0.045559  0.106268  0.000000
## 0|1  0.106268 -0.212536  0.000000  0.106268
## 1|0  0.045559  0.000000 -0.091117  0.045559
## 1|1  0.000000  0.045559  0.106268 -0.151826
## 
## Dependent (x & y) model rate matrix:
##               0|0          0|1         1|0           1|1
## 0|0 -19786.264614 10000.000000 9786.264614      0.000000
## 0|1      0.070508    -0.141017    0.000000      0.070508
## 1|0      0.046192     0.000000   -0.092383      0.046192
## 1|1      0.000000 10000.000000 9786.840407 -19786.840407
## 
## Model fit:
##             log-likelihood      AIC
## independent      -45.66622 99.33243
## dependent        -20.00553 56.01107
## 
## Hypothesis test result:
##   likelihood-ratio:  51.3214 
##   p-value:  1.9124e-10 
## 
## Model fitting method used was fitMk 
## 
## R thinks both model optimizations converged.

We’re still hitting the upper bound – we could set max.q higher, but our results would stay (qualitatively) the same.

What’s going on here?

Well, in effect, we’ve manufactured data in which the character levels 0|0 and 1|1 are so unstable (indeed, they’re never observed among our tip taxa) that any high transition rate values away from these states into the stable combinations of 1|0 or 0|1 increase the probability of observing the data that we have.

Obviously, this example is artificial – but the same problem is likely to afflict real datasets for which x and y are not necessarily the same trait, but whose evolutionary history is highly correlated.

A take home message for me (as a developer) is that I need to make fitPagel more effective at diagnosing & reporting cases in which an optimization bound has been reached! Look for that in a future phytools update.

Thursday, May 22, 2025

A semi-threshold trait evolution model in phytools

A few weeks ago on this blog I posted about a new “semi-threshold” model for quantitative trait evolution based on the discretized diffusion approximation of Boucher & Démery (2016).

As a reminder that repeats a little bit of what I already described in that prior post, the threshold model is a discrete trait evolution model in which the observed character is discretely-valued – but is underlain by an unseen continuous trait called “liability.” Every time this liability crosses some fixed threshold value, our discrete character changes state.

A semi-threshold model, by contrast, imagines that on some part of its range liability (the continuous trait) can be observed as a measurable quantitative character. On other parts of the range, liability is not observable but nonetheless continuous to evolve.

An example of a continuous trait that might evolve in this way could include, for instance, “proportion of animal matter in the diet” of a carnivorous animal. This trait can reach 100%, but the underlying basis of carnivory continuous to evolve away from this “threshold” in a manner that can’t be observed by measuring diet. The more evolution that transpires past the threshold, the harder it is to get back! (One might even imagine a parallel phenomenon at 0%.)

Here’s a visualization of evolution under this semi-threshold process using a new graphical simulator adapted from a different recent post on this blog. The lower panel shows evolution of the liability, while the continuous character is projected onto the tree. Note that when the threshold is reached liability continuous to evolve but the observed trait stops – that is, unless, liability happens to evolve back towards the threshold & cross it!

library(phytools)
packageVersion("phytools")
## [1] '2.4.16'
tree<-pbtree(n=200,scale=1)
z<-sim.absorbing(tree,sig2=2.2,bounds=c(-1,1),
  semithreshold=TRUE)

plot of chunk unnamed-chunk-60

Data that arise under this semi-threshold process are decidedly non-normal – tending to accumulate on the upper & lower threshold.

Let’s see.

par(mar=c(5.1,4.1,1.1,1.1))
hist(z,breaks=20,main="",las=1,cex.axis=0.8,
  xlab="phenotypic trait (z)")

plot of chunk unnamed-chunk-61

Yup.

Now let’s fit our semi-threshold model, which can now be done using the phytools function fitsemiThresh.

stmodel<-fitsemiThresh(tree,z,threshold=c(-1,1),
  levs=300)
stmodel
## Object of class "fitsemiThresh" based on
##    a discretization with k = 300 levels on the interval [ -5.036777 , 5.036777 ].
## 
## Set or estimated thresholds: [ -1 , 1 ]
## 
## Fitted model parameters:
##   sigsq: 2.276992 
##      x0: -0.15 
## 
## Log-likelihood: 229.855466 
## 
## R thinks it has found the ML solution.

Our fitted \(\sigma^2\) is pretty close to the generating value of \(\sigma^2 = 2.2\), so that’s neat.

We can compare this to a standard Brownian model for this trait, although I can already tell you how that’s going to go.

First, I’ll do it exactly using geiger::fitContinuous.

geiger::fitContinuous(tree,z)
## GEIGER-fitted comparative model of continuous data
##  fitted 'BM' model parameters:
## 	sigsq = 0.767761
## 	z0 = -0.064268
## 
##  model summary:
## 	log-likelihood = -99.067442
## 	AIC = 202.134883
## 	AICc = 202.195797
## 	free parameters = 2
## 
## Convergence diagnostics:
## 	optimization iterations = 100
## 	failed iterations = 0
## 	number of iterations with same best fit = 100
## 	frequency of best fit = 1.000
## 
##  object summary:
## 	'lik' -- likelihood function
## 	'bnd' -- bounds for likelihood search
## 	'res' -- optimization iteration summary
## 	'opt' -- maximum likelihood parameter estimates

We’ll also get the same parameter estimates and log-likelihood from fitsemThresh but setting the thresholds to threshold = c(-Inf,Inf) – inasmuch as they’ll converge in the limit as our bin witdth \(\delta \rightarrow 0\), of course!

bmmodel<-fitsemiThresh(tree,z,threshold=c(-Inf,Inf),
  levs=300)
bmmodel
## Object of class "fitsemiThresh" based on
##    a discretization with k = 300 levels on the interval [ -4.596527 , 4.596527 ].
## 
## Set or estimated thresholds: [ -4.596527 , 4.596527 ]
## 
## Fitted model parameters:
##   sigsq: 0.758371 
##      x0: -0.076609 
## 
## Log-likelihood: -99.48216 
## 
## R thinks it has found the ML solution.

That’s all I’m gonna say about this for now, folks! Good night.