Thursday, July 17, 2025

A Pagel '94 type correlation model with one binary trait and one continuous character for phytools

Just yesterday I posted code illustrating how to fit a Pagel (1994) type correlational trait evolution model, but for one binary and a second continuous trait using the discretized diffusion approximation of Boucher & Démery (2016).

The basic idea is that the “forward” (i.e., \(0 \rightarrow 1\)) and backward (i.e., \(1 \rightarrow 0\)) transition rates for our discrete character are determined as some function of our continuous trait. I arbitrarily focused on a sigmoid curve described by four parameters: the lower and upper asymptotes; the inflection point; and the “steepness” of the curve.

Just like the Pagel (1994) model, this needn’t correspond to a correlation and will only do so (in a meaningful way) if certain character combinations tend to disproportionately accumulate under the evolutionary process: e.g., large things in state a and small things in state b, or similar. This model also might be a good explanation for data that arise with a high back-and-forth transition rate for the discrete trait when the continuous character value is low and the converse.

I have now implemented this method in a new phytools function, fitcontMk, which can be obtained by updating phytools from GitHub. For the full code of this function and its S3 methods (which should be considered very \(\beta\)) you can see GitHub.

library(phytools)
packageVersion("phytools")
## [1] '2.5.0'

To illustrate use of this new method, we (of course) must have data that arose under our hypothesized process. In my earlier post I demonstrated this simulation, so I won’t reiterate that explanation here.

## number of taxa in my tree
N<-500
tree<-pbtree(n=N,scale=10)
## number of steps in simulation
nn<-400
## this creates a tree with many singleton
## nodes
h<-max(nodeHeights(tree))
tt<-map.to.singleton(
  make.era.map(tree,
    limits=seq(0,h-h/nn,length.out=nn))
)
## simulate Brownian evolution
y<-fastBM(tt,internal=TRUE,sig2=0.25)
## re-center to zero
y<-y-mean(y)
## compute the trait mean on each edge
edge_y<-rowMeans(
  matrix(y[tt$edge],nrow(tt$edge),
    ncol(tt$edge)))
## here's our sigmoid functions
q01<-2+(0.01-2)/(1+exp(-5*(edge_y-0.1)))
q10<-0.01+(2-0.01)/(1+exp(-5*(edge_y+0.1)))
## visualize them
par(mfrow=c(2,1),mar=c(5.1,4.1,1.1,1.1),
  lend=1)
ii<-order(edge_y)
forward<-q01[ii]
plot(edge_y[ii],forward,type="l",
  xlab="continuous trait",
  ylab="transition rate (a to b)",
  las=1,bty="n",cex.axis=0.6,
  ylim=c(0,max(c(q01,q10))))
box(col="grey")
grid()
backward<-q10[ii]
plot(edge_y[ii],backward,type="l",
  xlab="continuous trait",
  ylab="transition rate (b to a)",
  las=1,bty="n",cex.axis=0.6,
  ylim=c(0,max(c(q01,q10))))
box(col="grey")
grid()

plot of chunk unnamed-chunk-8

Now, let’s simulate data for our discrete trait given the scenario illustrated above:

x<-vector(length=Ntip(tt)+Nnode(tt))
x[Ntip(tt)+1]<-sample(1:2,1)
for(i in 1:nrow(tt$edge)){
  Q<-matrix(
    c(-q01[i],q01[i],q10[i],-q10[i]),2,2,
    byrow=TRUE)
  P<-expm::expm(Q*tt$edge.length[i])
  x[tt$edge[i,2]]<-sample(1:2,1,
    prob=P[x[tt$edge[i,1]],])
}
COLS<-hcl.colors(n=100)
cols<-COLS[round((edge_y-min(edge_y))/
    diff(range(edge_y))*99)+1]
par(mfrow=c(1,2),mar=c(1.1,1.1,1.1,1.1))
plot(tt,edge.color=cols,edge.width=1,
  show.tip.label=FALSE,y.lim=c(-0.1*N,N))
add.color.bar(0.75*max(nodeHeights(tree)),
  cols=COLS,title="continuous trait",
  lims=round(range(y),2),digits=2,
  prompt=FALSE,outline=FALSE,
  x=0.125*max(nodeHeights(tree)),y=-0.1*N,
  subtitle="")
COLS<-hcl.colors(n=2)
cols=COLS[x[tt$edge[,2]]]
plot(tt,edge.color=cols,edge.width=1,
  show.tip.label=FALSE,direction="leftwards",
  y.lim=c(-0.1*N,N))
add.simmap.legend(leg=letters[1:2],
  colors=setNames(COLS,letters[1:2]),prompt=FALSE,
  vertical=FALSE,x=0.5*max(nodeHeights(tree)),
  y=-0.1*N)

plot of chunk unnamed-chunk-9

This simulation condition should’ve produced a fairly strong associating between low values of the continuous character and level b of the discrete trait, and high values of the continuous character with level a of our discrete trait.

Now let’s peel out the tip values from our simulation.

## continuous character
y<-y[1:Ntip(tree)]
## discrete trait
x<-setNames(letters[1:2][x[1:Ntip(tt)]],
  tt$tip.label)
rm(list=setdiff(ls(),c("tree","x","y")))

Awesome.

Finally, we can fit our model using fitcontMk. For users testing this, I recommend you turn on the argument trace=2 so that you can get updates of the optimization as it runs. (Otherwise just be prepared to stare at a blank screen for several hours!)

fitted_model<-fitcontMk(tree,x,y,parallel=TRUE,
  ncores=10,levs=100,maxit=5000)
fitted_model
## Object of class "fitcontMk" based on
##     a discretization with k = 100 levels.
## 
## Fitted sigmoidal continuous-trait dependent
##     M2 model parameters:
## 
##  states: [ a, b ]
##   sigsq:  0.2694 
##     q01: [ 1.1872, 0.0192 ]
##     q10: [ 0, 2.2724 ]
##       B: [ 20.0787, 4.7859 ]
##       M: [ 0.5481, -0.068 ]
## 
## log(L): -629.3899 (df = 9)
## 
## R thinks it has found the ML solution.

Remember, our generating values for simulation were \(\sigma^2 = 0.25\), \(q_{0,1} = [2,0.01]\), \(q_{1,0} = [0.01,2]\), \(B_{0,1} = B_{1,0} = 5.0\), \(M_{0,1} = 0.1\), and \(M_{1,0} = -0.1\). Did we come close?

Let’s plot and compare.

par(lend=1)
obj<-plot(fitted_model)
xx<-seq(min(y),max(y),length.out=100)
q01<-2+(0.01-2)/(1+exp(-5*(xx-0.1)))
q10<-0.01+(2-0.01)/(1+exp(-5*(xx+0.1)))
par(mfg=obj$mfg[[1]])
par(usr=obj$usr[[1]])
lines(xx,q01,col=make.transparent("blue",0.5),lwd=5)
legend("topleft",legend=c("estimated","generating"),
  lwd=c(1,5),col=c("black",make.transparent("blue",0.5)))
par(mfg=obj$mfg[[2]])
par(usr=obj$usr[[2]])
lines(xx,q10,col=make.transparent("blue",0.5),lwd=5)
legend("topleft",legend=c("estimated","generating"),
  lwd=c(1,5),col=c("black",make.transparent("blue",0.5)))

plot of chunk unnamed-chunk-1

Could be worse!

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.