Sunday, July 13, 2025

A Pagel '94 type correlation test with one binary trait and a second continuous character via the discretized diffusion approximation

A colleague recently showed me a project in which she’s interested in testing for a relationship between two traits – but one is binary & the other is continuously-valued. She had discretized the continuous character into “low” and “high” values of the trait and was using Pagel’s (1994) method.

It occurred to me, of course, that a directly analagous test could be done without discretizing the continuous character -- via the discretized diffusion approximation of Boucher & Démery (2016). (For more on this, I recommend reading my blog – e.g., 1, 2, 3, 4, and others.)

My idea in this case would be that the rates of change between levels in our discrete characte, \(x\), (i.e., from 0 to 1 and the reverse) would depend directly via some functional form (e.g., a sigmoidal function) on our continuous character, \(y\).

I have not yet implemented this in an explicit phytools function, but I can already do it.

To see how, let’s begin by figuring out how to simulate data under our model.

This is hard enough. What I ended up doing was discretizing time into many bins on my tree, simulating Brownian motion trait evolution for \(y\), and then using different values of my transition matrix Q that depended on the mean edge condition of \(y\) (but, remember, we've finely split up our tree into many, many non-dividing edges).

Here’s a quick demo of my simulation.

## load phytools
library(phytools)
## number of taxa in my tree
N<-200
tree<-pbtree(n=N,scale=100)
## number of steps in simulation
nn<-400
## this creates a tree with many singleton
## nodes
tt<-map.to.singleton(
  make.era.map(tree,
  limits=seq(0,100-100/nn,length.out=nn))
)
## simulate Brownian evolution
y<-fastBM(tt,internal=TRUE,sig2=0.1)
## re-center to zero
y<-y-mean(y)
## compute the trait mean on each edge
edge_y<-rowMeans(
  matrix(y[tt$edge],nrow(tt$edge),
  ncol(tt$edge)))

Now I’m going to generate data for \(x\), our binary trait, in which the forward, \(0 \rightarrow 1\), and backward, \(1 \rightarrow 0\), transition rates vary as a function of \(y\) according to potentially different sigmoidal functional forms of the type:

$$ q(y) = q_{\mathrm{low}} + \frac{q_{\mathrm{high}} - q_{\mathrm{low}}}{1 + e^{-B(y - M)}} $$

So, in other words, for each direction of evolution, from 0 to 1 or the reverse, we've parameterized the substitution process with a total of four parameters: two giving the lower and upper asymptotes; one giving the inflection point (M); and the last giving the steepness of the sigmoidal part of the function (B).

## this is our functional form
## q.f<-q0.f+(q1.f-q0.f)/(1+exp(-B.f*(xx-M.f)))
## q.b<-q0.b+(q1.b-q0.b)/(1+exp(-B.b*(xx-M.b)))
q01<-0.01+(0.1-0.01)/(1+exp(-2*edge_y))
q10<-0.2+(0.02-0.2)/(1+exp(-3*edge_y))

Let’s visualize our two generating functional forms.

par(mfrow=c(2,1),mar=c(5.1,4.1,1.1,1.1))
ii<-order(edge_y)
forward<-q01[ii]
plot(edge_y[ii],forward,type="l",
  xlab="continuous trait",
  ylab="transition rate (0 to 1)",
  las=1,bty="n",cex.axis=0.6,ylim=c(0,0.2))
box(col="grey")
grid()
backward<-q10[ii]
plot(edge_y[ii],backward,type="l",
  xlab="continuous trait",
  ylab="transition rate (1 to 0)",
  las=1,bty="n",cex.axis=0.6,ylim=c(0,0.2))
box(col="grey")
grid()

plot of chunk unnamed-chunk-5

Next, we can simulate under our model -- once again, having discretized time. After I simulate, I'll also plot the simulation on my tree for the continuous character (on the left) and the discrete trait. Remember, according to my generating process the value of our continuous trait should influence the transition rate of the discrete character in a way that I predict will result in an accumulation of larger things with state 1 and smaller things with state 0.

x<-vector(length=Ntip(tt)+Nnode(tt))
x[Ntip(tt)+1]<-sample(1:2,1)
for(i in 1:nrow(tt$edge)){
  Q<-matrix(
    c(-q01[i],q01[i],q10[i],-q10[i]),2,2,
    byrow=TRUE)
  P<-expm::expm(Q*tt$edge.length[i])
  x[tt$edge[i,2]]<-sample(1:2,1,
    prob=P[x[tt$edge[i,1]],])
}
COLS<-hcl.colors(n=100)
cols<-COLS[round((edge_y-min(edge_y))/
    diff(range(edge_y))*99)+1]
par(mfrow=c(1,2),mar=c(1.1,1.1,1.1,1.1))
plot(tt,edge.color=cols,edge.width=3,
  show.tip.label=FALSE,y.lim=c(-0.1*N,N))
add.color.bar(0.75*max(nodeHeights(tree)),
  cols=COLS,title="continuous trait",
  lims=round(range(y),2),digits=2,
  prompt=FALSE,outline=FALSE,
  x=0.125*max(nodeHeights(tree)),y=-0.1*N,
  subtitle="")
COLS<-hcl.colors(n=2)
cols=COLS[x[tt$edge[,2]]]
plot(tt,edge.color=cols,edge.width=3,
  show.tip.label=FALSE,direction="leftwards",
  y.lim=c(-0.1*N,N))
add.simmap.legend(leg=0:1,
  colors=setNames(COLS,0:1),prompt=FALSE,
  vertical=FALSE,x=0.5*max(nodeHeights(tree)),
  y=-0.1*N)

plot of chunk unnamed-chunk-6

As predicted, we clearly see that things with smaller values on our continuous trait axis tend to more often have condition 0 of the discrete trait, and things with larger values more often find themselves with condition 1 of the discrete trait.

Next, I’ll pull out the discrete and continuous characters from my simulation as follows.

## continuous character
y<-y[1:Ntip(tree)]
## discrete trait
x<-setNames(x[1:Ntip(tt)],tt$tip.label)

Perfect. Now we’re ready to undertake our analysis.

To begin, I’m going discretize our continuous trait space of y and then create a large matrix of 0s and 1s based on our tip data and discretization.

## discretize y
levs<-100
y<-y[tree$tip.label]
lims<-phytools:::expand.range(y,p=1.5)
dd<-diff(lims)
delta<-dd/levs
tol<-1e-8*delta
bins<-cbind(seq(from=lims[1]-tol,by=(dd+2*tol)/levs,
  length.out=levs),seq(to=lims[2]+tol,by=(dd+2*tol)/levs,
    length.out=levs))
Y<-phytools:::to_binned(y,bins)

Next, I’m going to convert x to a binary matrix.

if(is.matrix(x)){
  X<-x[tree$tip.label,]
  m<-ncol(X)
  states<-colnames(X)
} else {
  X<-to.matrix(x,sort(unique(x)))
  X<-X[tree$tip.label,]
  m<-ncol(X)
  states<-colnames(X)
}

Finally, I’m going to combine discretized y and matrix x.

nn<-phytools:::x_by_y(colnames(X),colnames(Y))
XX<-matrix(0,nrow=nrow(X),ncol=ncol(X)*ncol(Y),
  dimnames=list(rownames(X),nn))
for(i in 1:ncol(X)){
  for(j in 1:nrow(Y)){
    XX[j,1:levs+(i-1)*levs]<-X[j,i]*Y[j,]
  }
}

As a last step, reorder my tree for post-order traversal.

pw<-reorder(tree,"postorder")

Now I’m ready for my likelihood function based on the discretized diffusion approximation.

lfunc<-function(theta,...){
  if(hasArg(plot)) plot<-list(...)$plot
  else plot<-FALSE
  if(hasArg(trace)) trace<-list(...)$trace
  else trace<-0
  if(hasArg(parallel)) parallel<-list(...)$parallel
  else parallel<-TRUE
  sig2<-theta[1]
  q0.f<-theta[2]
  q1.f<-theta[3]
  B.f<-theta[4]
  M.f<-theta[5]
  q0.b<-theta[6]
  q1.b<-theta[7]
  B.b<-theta[8]
  M.b<-theta[9]
  MODEL<-INDEX<-matrix(0,nrow=ncol(XX),ncol=ncol(XX),
    dimnames=list(nn,nn))
  for(i in 1:(levs-1)){
    for(j in 0:(ncol(X)-1)){
      MODEL[i+j*levs,i+1+j*levs]<-
        MODEL[i+1+j*levs,i+j*levs]<-sig2/(2*delta^2)
      INDEX[i+j*levs,i+1+j*levs]<-
        INDEX[i+1+j*levs,i+j*levs]<-1
    }
  }
  xx<-rowMeans(bins)
  q.f<-q0.f+(q1.f-q0.f)/(1+exp(-B.f*(xx-M.f)))
  q.b<-q0.b+(q1.b-q0.b)/(1+exp(-B.b*(xx-M.b)))
  for(i in 1:levs){
    INDEX[i,i+levs]<-i+1
    INDEX[i+levs,i]<-i+levs+1
  }
  diag(MODEL)<--rowSums(MODEL)
  if(parallel)
    lik<-phytools:::parallel_pruning(
      q=c(sig2/(2*delta^2),q.f,q.b),
      tree=pw,x=XX,model=INDEX,
      pi="mle")-Ntip(pw)*log(delta)
  else 
    lik<-phytools:::pruning(
      q=c(sig2/(2*delta^2),q.f,q.b),
      tree=pw,x=XX,model=INDEX,
      pi="mle")-Ntip(pw)*log(delta)
  if(plot){
    par(mfrow=c(2,1),mar=c(5.1,4.1,1.1,1.1))
    forward<-q.f
    plot(rowMeans(bins),forward,type="l",
      xlab="continuous trait",
      ylab="transition rate (0 to 1)",
      las=1,bty="n",cex.axis=0.6)
    box(col="grey")
    grid()
    backward<-q.b
    plot(rowMeans(bins),backward,type="l",
      xlab="continuous trait",
      ylab="transition rate (1 to 0)",
      las=1,bty="n",cex.axis=0.6)
    box(col="grey")
    grid()
  }
  if(trace>0){
    cat(paste("sig2 ","q0.f ","q1.f ","B.f  ","M.f  ",
        "q0.b ","q1.b ","B.b  ","M.b  ","\n",sep="\t"))
    cat(paste(round(theta,3),collapse="\t"))
    cat(paste("\nlog(L) = ",round(lik,4),"\n\n"))
  }
  return(-lik)
}

As a gut-check, let’s compare the total likelihood when our x and y traits evolve independently – both via geiger::fitContinuous and fitMk, and under our discretized diffusion approximation.

print(logLik(geiger::fitContinuous(tree,y))+
    logLik(fitMk(tree,x,model="ARD",pi="mle")))
## [1] -479.9862
## attr(,"df")
## [1] 2
mk_fit<-fitMk(tree,x,model="ARD",pi="mle")
theta<-c(
  geiger::fitContinuous(tree,y)$opt$sigsq,
  mk_fit$rates[2],mk_fit$rates[1],0,0,
  mk_fit$rates[2],mk_fit$rates[1],0,0)
-lfunc(theta,parallel=FALSE)
## [1] -482.5151

(These get much closer for higher values of levs, and should eventually converge.)

Now, even though I’m pretty confident we’ve got the likelihood function correct, one serious difficulty that I’ve run into is in optimization. To try and deal with that I’m going to make an effort to put our likelihood search in the vicinity of the optimum. Of course, since we don’t know what that is, this is kind of hard to guarantee.

What I’ve done is basically fit standard Mk models for x after subsampling to only the lower and higher values of y, respectively, to get better guesses of the upper and lower asymptotes of my sigmoidal function, if it exists. Get it?

## starting value of sig2
sig2<-geiger::fitContinuous(tree,y)$opt$sigsq

## subsample x for only the lowest values of y
nm<-names(which(y<=mean(y)))
low<-fitMk(keep.tip(tree,nm),x[nm],model="ARD",
  pi="mle")
## repeat for highest values of y
nm<-names(which(y>=mean(y)))
high<-fitMk(keep.tip(tree,nm),x[nm],model="ARD",
  pi="mle")

## set q0 and q1
q0.f<-low$rates[2]
q1.f<-high$rates[2]
q0.b<-low$rates[1]
q1.b<-high$rates[1]

## set B and M
B.f<-diff(range(y))/10
M.f<-mean(y)
B.b<-diff(range(y))/10
M.b<-mean(y)

## starting parameter values for optimization
theta<-c(sig2,
  q0.f,q1.f,B.f,M.f,
  q0.b,q1.b,B.b,M.b)

Now let’s cross our fingers and try to optimize this model!

## set up to parallelize pruning
ncores<-10
mc<-parallel::makeCluster(ncores,type="PSOCK")
doParallel::registerDoParallel(cl=mc)

## tolerance
tol<-1e-8

## run optimizaton
fit<-optim(theta,lfunc,method="L-BFGS-B",
  lower=c(0,0,0,-Inf,-Inf,0,0,-Inf,-Inf)+tol,
  upper=Inf,maxit=2000)

## stop cluster
parallel::stopCluster(cl=mc)

(This took a long time, for what it's worth!)

Let’s take a look at the fitted model:

fit
## $par
## [1]  0.103066529  0.002911027  0.139441577  2.099766477  0.637681306  0.142375923
## [7]  0.023854313  1.908336771 -0.685350760
## 
## $value
## [1] 413.5706
## 
## $counts
## function gradient 
##       82       82 
## 
## $convergence
## [1] 0
## 
## $message
## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"

Finally, why don’t we plot the model we optimized and compare it to the generating conditions?

par(mfrow=c(2,1),mar=c(5.1,4.1,1.1,1.1))
forward<-fit$par[2]+(fit$par[3]-fit$par[2])/
  (1+exp(-fit$par[4]*(rowMeans(bins)-fit$par[5])))
plot(rowMeans(bins),forward,type="l",
  xlab="continuous trait",
  ylab="transition rate (0 to 1)",
  las=1,bty="n",cex.axis=0.6,
  ylim=c(0,0.4),col=palette()[2])
lines(edge_y[ii],q01[ii],col=palette()[4])
legend("topleft",c("estimated","generating"),
  col=palette()[c(2,4)],lwd=2,bty="n",
  cex=0.8)
box(col="grey")
grid()
backward<-fit$par[6]+(fit$par[7]-fit$par[6])/
  (1+exp(-fit$par[8]*(rowMeans(bins)-fit$par[9])))
plot(rowMeans(bins),backward,type="l",
  xlab="continuous trait",
  ylab="transition rate (1 to 0)",
  las=1,bty="n",cex.axis=0.6,
  ylim=c(0,0.4),col=palette()[2])
box(col="grey")
lines(edge_y[ii],q10[ii],col=palette()[4])
legend("topleft",c("estimated","generating"),
  col=palette()[c(2,4)],lwd=2,bty="n",
  cex=0.8)
grid()

plot of chunk unnamed-chunk-18

Very cool. This is quite a close match, in my opinion.

That’s it for now!

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.