Friday, June 7, 2024

Fitting a discrete state-dependent continuous trait evolution model using the discrete approximation of Boucher & Démery (2016)

Recently, I’ve been somewhat obsessed with an underappreciated article published by Boucher & Démery in 2016.

I’ve described this article as “the most important phylogenetic comparative methods paper you’ve probably never read,” and I’m scheduled to give a talk of the same name at this summer’s Joint Congress on Evolutionary Biology in Montreal, Canada.

The Boucher & Démery article is about an evolutionary model (bounded Brownian motion) – but in my opinion what’s really cool about it is the authors’ identification of the equivalency (in the limit as the step size approaches zero) of Brownian motion and an ordered, symmetric, continuous time Markov chain (Mk) model.

Their revolutionary (my words!) assertion is that we should be able to exploit this equivalency to estimate the probability density of continuous data (and thus the likelihood of our model) from a discretization of the state space under circumstances (i.e., models) in which an analytic probability density function in the original continuous space is unavailable, Brownian motion with reflective bounds being just such a case. Indeed, Boucher et al. (2018) follow up on their 2016 study to illustrate this for a range of other models.

I’ve been referring to Boucher & Démery’s innovation as “the discrete approximation;” however, it’s worth noting that this is only an approximation in the sense that it becomes exact in the limit as our bin widths for discretization shrink to zero! In other words, this isn’t a hack to get around the problem of not being able to calculate the likelihood – it’s a computational solution to calculate the likelihood exactly in the limit as our bin widths shrink.

Ever since I finally appreciated the significance of this equivalency, I’ve been working on other applications.

One of these that I posted last month was for a multi-state threshold model.

Another that I’ve been thinking about for a while but that I finally began working on today is joint discrete and continuous character estimation under state-dependent rate heterogeneous trait evolution model á la O’Meara et al. (2006). A key difference is that because we’re going to estimate the joint evolution of our discrete and continuous traits, there’s no need to “map” hypotheses of our discrete character’s history onto the tree, e.g., using stochastic mapping.

For this analysis, our model matrix is going to be \(k \cdot n \times k \cdot n\) in size (for \(k\) levels of our discrete trait and \(n\) bins for our continuous character).

This model matrix would have symmetric sub- and super-diagonals for each of the two diagonal \(n \times n\) submatrices, and then the other non-diagonal \(n \times n\) submatrices would each be sparse and contain only a superdiagonal or subdiagonal with the transition rates between discrete character levels.

I haven’t yet written this up into a phytools function, but I already understand how to script it – so I’m going to show that first for a binary trait that influences the rate of evolution of a continuous character. Note that this model easily generalizes to an arbitrary number of levels of a discrete character, as well as any transition model (for that character) between levels!

Let’s gin up exactly this evolutionary scenario via simulation.

library(phytools)
## simulate a tree
tree<-pbtree(n=120,scale=10)
## discrete character matrix
Q<-0.2*matrix(c(-1,1,1,-1),2,2,dimnames=list(0:1,0:1))
Q
##      0    1
## 0 -0.2  0.2
## 1  0.2 -0.2
## simulate discrete character history on tree
tt<-sim.history(tree,Q)
## Done simulation(s).
x<-as.factor(getStates(tt,"tips"))
sig2<-setNames(c(1,20),0:1)
y<-sim.rates(tt,sig2)
layout(matrix(c(1,2),1,2),widths=c(0.4,0.6))
cols<-setNames(c("blue","red"),0:1)
plot(tt,cols,ftype="off",mar=c(4.1,1.1,1.1,1.1))
par(mar=c(4.1,5.1,1.1,1.1))
phenogram(tt,y,colors=make.transparent(cols,0.5),
  ftype="off",las=1)

plot of chunk unnamed-chunk-33

Now let’s go ahead and build & fit our model using the discrete approximation of Boucher & Démery.

To do this, I’m going to use some internal phytools functions that I coded for bounded_bm.

All of the following are preliminaries to that.

## set number of levels for discretization
levs<-100
## set range of continuous trait (far outside observed data)
lims<-phytools:::expand.range(y)
dd<-diff(lims)
tol<-1e-8*dd/levs
## calculate bins
bins<-cbind(seq(from=lims[1]-tol,by=(dd+2*tol)/levs,
  length.out=levs),seq(to=lims[2]+tol,by=(dd+2*tol)/levs,
    length.out=levs))
## bin discrete data
X<-phytools:::to_binned(y,bins)
## identify all combinations of discrete character
## and binned continuous trait
nn<-c(paste(levels(x)[1],colnames(X),sep=","),
  paste(levels(x)[2],colnames(X),sep=","))
## create a new, larger N x n*k data matrix
eeX<-matrix(0,nrow=nrow(X),ncol=2*levs,
  dimnames=list(rownames(X),nn))
for(i in 1:nrow(X))
  eeX[i,1:levs+(as.numeric(x)[i]-1)*levs]<-X[i,]

Now, let’s create our model design matrix.

Remember, it should be of dimensions \(k \cdot n \times k \cdot n\) for \(k\) levels of our discrete character and \(n\) bins (levs in the code above).

model<-matrix(0,ncol=ncol(eeX),nrow=ncol(eeX),
  dimnames=list(nn,nn))
for(i in 1:(levs-1)){
  model[i,i+1]<-1
  model[i+1,i]<-1
  model[i+levs,i+1+levs]<-2
  model[i+1+levs,i+levs]<-2
  model[i,i+levs]<-3
  model[i+levs,i]<-3
}

Just to visually see the structure of this model, let’s graph it. For this, you need the simple CRAN plot.matrix package.

library(plot.matrix)
par(fg="transparent")
plot(model,las=2,cex.axis=0.4,
  col=c("#F0EAD6","blue","red","black"),
  breaks=seq(0,4,by=1),main="structure of model")

plot of chunk unnamed-chunk-37

par(fg="black")

Just to get a sense of what’s going on, in our design matrix the blue cells are transitions in the (discretized) continuous space when the discrete character is in condition "0"; red cells are the same, but when the discrete character is in condition "1"; finally, the black cells show transitions for the discrete trait, which I assume to be symmetric. (That’s what we simulated.)

Now let’s optimize our model.

I promise that it’ll be useful to start from reasonable parameter values! (I think where those come from is pretty self-explanatory.)

q.init<-runif(n=3,0,2)*c(
  rep((1/2)*mean(pic(y,tree)^2)*(levs/dd)^2,2),
  fitMk(tree,x)$rates)
fit<-fitMk(tree,eeX,model=model,
  lik.func="parallel",
  expm.method="R_Eigen",pi="mle",
  rand_start=TRUE,
  logscale=TRUE,
  q.init=q.init)

Having fit our model, we’re still not done.

We can borrow the equation from my prior post on this subject to transform our transition rates in the discrete space to (in this case) our two different, state-dependent Brownian \(\sigma^2\) values.

Remember, the generating conditions of our simulation were \(\sigma^2 = [1, 20]\) for the two different discrete levels (0 and 1) of our trait.

sig2<-2*fit$rates[1:2]*(dd/levs)^2
sig2
## [1]  1.386611 20.980551

Again, we can also get our jointly estimated transition rate between states.

In theory, I believe that this estimate should be more accurate that the one we might obtain by fitting the Mk model to our discrete character data alone. This is because (if the rate of evolution in the continuous trait varies as a function of our discrete character, then) our continuous trait should have information about the evolutionary process for our discrete character as well.

In this case, recall, our generating transition rate (\(q\)) was equal to \(q = 0.2\).

q<-fit$rates[3]
q
## [1] 0.2069314

Finally, we can obtain our total likelihood in much the same way as we did before. In this case, it is the log probability density of our continuous trait and of our discrete character.

lik<-logLik(fit)-Ntip(tree)*log(dd/levs)
lik
## [1] -398.2394
## attr(,"df")
## [1] 3

(Ignore the degrees of freedom here – that’s not correct. Our fitted model actually has a total of four parameters, including the ancestral state of our continuous trait.)

We can contrast this fitted model with a discrete character state independent continuous trait evolution model.

Doing so using the same method would look as follows.

model<-matrix(0,ncol=ncol(eeX),nrow=ncol(eeX),
  dimnames=list(nn,nn))
for(i in 1:(levs-1)){
  model[i,i+1]<-1
  model[i+1,i]<-1
  model[i+levs,i+1+levs]<-1
  model[i+1+levs,i+levs]<-1
  model[i,i+levs]<-2
  model[i+levs,i]<-2
}
library(plot.matrix)
par(fg="transparent")
plot(model,las=2,cex.axis=0.4,
  col=c("#F0EAD6","purple","black"),
  breaks=seq(0,3,by=1),main="structure of model")

plot of chunk unnamed-chunk-42

par(fg="black")

Let’s fit it.

q.init<-runif(n=2,0,2)*c(
  (1/2)*mean(pic(y,tree)^2)*(levs/dd)^2,
  fitMk(tree,x)$rates)
fit_null<-fitMk(tree,eeX,model=model,
  lik.func="parallel",
  expm.method="R_Eigen",pi="mle",
  rand_start=TRUE,
  logscale=TRUE,q.init=q.init)

Once again, we can pull out our single value of \(\sigma^2\), our single transition rate between discrete states, and our log-likelihood as follows.

sig2<-2*fit_null$rates[1]*(dd/levs)^2
sig2
## [1] 9.829692
q<-fit_null$rates[2]
q
## [1] 0.1768809
lik<-logLik(fit_null)-Ntip(tree)*log(dd/levs)
lik
## [1] -419.1329
## attr(,"df")
## [1] 2

(Here, our fitted model actually has three parameters: \(x_{0}\), \(q\), and our single value of \(\sigma^2\).)

Uncoincidentally, these estimated values of \(\sigma^2\) and the discrete character transition rate (\(q\)) should closely resemble independent estimates that we might obtain using (for instance) geiger::fitContinuous and phytools::fitMk.

Let’s check to see if that’s true!

geiger::fitContinuous(tree,y)$opt$sigsq
## [1] 9.913359
phytools::fitMk(tree,x)$rates
## [1] 0.1829259

(If they’re a little bit different, it’s probably due to the coarseness of our discretization with levs = 100. Try increasing levs and see what happens! Just be forewarned – it will be slow.)

Likewise, our likelihood of the independent model is just the logged product of the separate likelihoods – which is equal to the sum of the logs, of course.

logLik(fitMk(tree,x))+logLik(geiger::fitContinuous(tree,y))
## [1] -418.6463
## attr(,"df")
## [1] 1

Finally, in a WhatsApp chat about this today, a friend & colleague (Luke Mahler) asked a question that made me realize that we could also fit a hidden-rates version of this model in which changes in the rate occur, but they’re not associated with any particular observed character trait.

To fit this model by adapting our scripts from above, we can start first by expanding our data matrix as follows. (I’m going to call my two different hidden discrete character levels 0 and 0*.)

eeX<-cbind(X,X)
colnames(eeX)<-paste(colnames(eeX),
  c(rep("0",levs),rep("0*",levs)),sep=",")

Now let’s proceed to our design matrix.

model<-matrix(0,ncol=ncol(eeX),nrow=ncol(eeX),
  dimnames=list(colnames(eeX),colnames(eeX)))
for(i in 1:(levs-1)){
  model[i,i+1]<-1
  model[i+1,i]<-1
  model[i+levs,i+1+levs]<-2
  model[i+1+levs,i+levs]<-2
  model[i,i+levs]<-3
  model[i+levs,i]<-3
}

Finally, we’ll fit our model. (I’m just making up a starting value for transitions in our hidden character here, since it hasn’t been observed!)

q.init<-runif(n=3,0,2)*c(
  rep((1/2)*mean(pic(y,tree)^2)*(levs/dd)^2,2),
  1)
fit_hidden<-fitMk(tree,eeX,model=model,
  lik.func="parallel",
  expm.method="R_Eigen",pi="mle",
  rand_start=TRUE,
  logscale=TRUE,
  q.init=q.init)

Now we can get our rates, \(\sigma^2\).

Of course, we should keep in mind that the order of these rates is arbitrary. (In our earlier analysis we expected to see them in the same order as our discrete trait, "0" and "1".)

sig2<-2*fit_hidden$rates[1:2]*(dd/levs)^2
sig2
## [1]  1.090839 22.618986

I believe that we can even get an estimate of the transition rate for our unobserved discrete character! (I guess in theory this might be the rate of transition for the invisible character influencing our continuous character’s evolution.)

q<-fit_hidden$rates[3]
q
## [1] 0.1997306

Lastly, we can compute a log-likelihood.

lik<-logLik(fit_hidden)-Ntip(tree)*log(dd/levs)
lik
## [1] -337.4346
## attr(,"df")
## [1] 3

(This model has four estimated parameters, including the transition rate for the hidden trait.)

Now, this log-likelihood is not comparable to the log-likelihoods we calculated earlier since it doesn’t account for the probability of our discrete trait!

It is, on the other hand, comparable to a log-likelihood that we might obtain from (say) geiger::fitContinuous.

logLik(geiger::fitContinuous(tree,y))
## [1] -351.7688
## attr(,"df")
## [1] 2

That’s all I have for now, folks – but there will be more to come… I promise!

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.