Recently, I’ve been somewhat obsessed with an underappreciated article published by Boucher & Démery in 2016.
I’ve described this article as “the most important phylogenetic comparative methods paper you’ve probably never read,” and I’m scheduled to give a talk of the same name at this summer’s Joint Congress on Evolutionary Biology in Montreal, Canada.
The Boucher & Démery article is about an evolutionary model (bounded Brownian motion) – but in my opinion what’s really cool about it is the authors’ identification of the equivalency (in the limit as the step size approaches zero) of Brownian motion and an ordered, symmetric, continuous time Markov chain (Mk) model.
Their revolutionary (my words!) assertion is that we should be able to exploit this equivalency to estimate the probability density of continuous data (and thus the likelihood of our model) from a discretization of the state space under circumstances (i.e., models) in which an analytic probability density function in the original continuous space is unavailable, Brownian motion with reflective bounds being just such a case. Indeed, Boucher et al. (2018) follow up on their 2016 study to illustrate this for a range of other models.
I’ve been referring to Boucher & Démery’s innovation as “the discrete approximation;” however, it’s worth noting that this is only an approximation in the sense that it becomes exact in the limit as our bin widths for discretization shrink to zero! In other words, this isn’t a hack to get around the problem of not being able to calculate the likelihood – it’s a computational solution to calculate the likelihood exactly in the limit as our bin widths shrink.
Ever since I finally appreciated the significance of this equivalency, I’ve been working on other applications.
One of these that I posted last month was for a multi-state threshold model.
Another that I’ve been thinking about for a while but that I finally began working on today is joint discrete and continuous character estimation under state-dependent rate heterogeneous trait evolution model á la O’Meara et al. (2006). A key difference is that because we’re going to estimate the joint evolution of our discrete and continuous traits, there’s no need to “map” hypotheses of our discrete character’s history onto the tree, e.g., using stochastic mapping.
For this analysis, our model matrix is going to be \(k \cdot n \times k \cdot n\) in size (for \(k\) levels of our discrete trait and \(n\) bins for our continuous character).
This model matrix would have symmetric sub- and super-diagonals for each of the two diagonal \(n \times n\) submatrices, and then the other non-diagonal \(n \times n\) submatrices would each be sparse and contain only a superdiagonal or subdiagonal with the transition rates between discrete character levels.
I haven’t yet written this up into a phytools function, but I already understand how to script it – so I’m going to show that first for a binary trait that influences the rate of evolution of a continuous character. Note that this model easily generalizes to an arbitrary number of levels of a discrete character, as well as any transition model (for that character) between levels!
Let’s gin up exactly this evolutionary scenario via simulation.
library(phytools)
## simulate a tree
tree<-pbtree(n=120,scale=10)
## discrete character matrix
Q<-0.2*matrix(c(-1,1,1,-1),2,2,dimnames=list(0:1,0:1))
Q
## 0 1
## 0 -0.2 0.2
## 1 0.2 -0.2
## simulate discrete character history on tree
tt<-sim.history(tree,Q)
## Done simulation(s).
x<-as.factor(getStates(tt,"tips"))
sig2<-setNames(c(1,20),0:1)
y<-sim.rates(tt,sig2)
layout(matrix(c(1,2),1,2),widths=c(0.4,0.6))
cols<-setNames(c("blue","red"),0:1)
plot(tt,cols,ftype="off",mar=c(4.1,1.1,1.1,1.1))
par(mar=c(4.1,5.1,1.1,1.1))
phenogram(tt,y,colors=make.transparent(cols,0.5),
ftype="off",las=1)
Now let’s go ahead and build & fit our model using the discrete approximation of Boucher & Démery.
To do this, I’m going to use some internal phytools functions that I coded for bounded_bm
.
All of the following are preliminaries to that.
## set number of levels for discretization
levs<-100
## set range of continuous trait (far outside observed data)
lims<-phytools:::expand.range(y)
dd<-diff(lims)
tol<-1e-8*dd/levs
## calculate bins
bins<-cbind(seq(from=lims[1]-tol,by=(dd+2*tol)/levs,
length.out=levs),seq(to=lims[2]+tol,by=(dd+2*tol)/levs,
length.out=levs))
## bin discrete data
X<-phytools:::to_binned(y,bins)
## identify all combinations of discrete character
## and binned continuous trait
nn<-c(paste(levels(x)[1],colnames(X),sep=","),
paste(levels(x)[2],colnames(X),sep=","))
## create a new, larger N x n*k data matrix
eeX<-matrix(0,nrow=nrow(X),ncol=2*levs,
dimnames=list(rownames(X),nn))
for(i in 1:nrow(X))
eeX[i,1:levs+(as.numeric(x)[i]-1)*levs]<-X[i,]
Now, let’s create our model design matrix.
Remember, it should be of dimensions \(k \cdot n \times k \cdot n\) for \(k\) levels of our discrete character and \(n\) bins (levs
in the code above).
model<-matrix(0,ncol=ncol(eeX),nrow=ncol(eeX),
dimnames=list(nn,nn))
for(i in 1:(levs-1)){
model[i,i+1]<-1
model[i+1,i]<-1
model[i+levs,i+1+levs]<-2
model[i+1+levs,i+levs]<-2
model[i,i+levs]<-3
model[i+levs,i]<-3
}
Just to visually see the structure of this model, let’s graph it. For this, you need the simple CRAN plot.matrix package.
library(plot.matrix)
par(fg="transparent")
plot(model,las=2,cex.axis=0.4,
col=c("#F0EAD6","blue","red","black"),
breaks=seq(0,4,by=1),main="structure of model")
par(fg="black")
Just to get a sense of what’s going on, in our design matrix the blue cells are transitions in the (discretized) continuous space when the discrete character is in condition "0"
; red cells are the same, but when the discrete character is in condition "1"
; finally, the black cells show transitions for the discrete trait, which I assume to be symmetric. (That’s what we simulated.)
Now let’s optimize our model.
I promise that it’ll be useful to start from reasonable parameter values! (I think where those come from is pretty self-explanatory.)
q.init<-runif(n=3,0,2)*c(
rep((1/2)*mean(pic(y,tree)^2)*(levs/dd)^2,2),
fitMk(tree,x)$rates)
fit<-fitMk(tree,eeX,model=model,
lik.func="parallel",
expm.method="R_Eigen",pi="mle",
rand_start=TRUE,
logscale=TRUE,
q.init=q.init)
Having fit our model, we’re still not done.
We can borrow the equation from my prior post on this subject to transform our transition rates in the discrete space to (in this case) our two different, state-dependent Brownian \(\sigma^2\) values.
Remember, the generating conditions of our simulation were \(\sigma^2 = [1, 20]\) for the two different discrete levels (0 and 1) of our trait.
sig2<-2*fit$rates[1:2]*(dd/levs)^2
sig2
## [1] 1.386611 20.980551
Again, we can also get our jointly estimated transition rate between states.
In theory, I believe that this estimate should be more accurate that the one we might obtain by fitting the Mk model to our discrete character data alone. This is because (if the rate of evolution in the continuous trait varies as a function of our discrete character, then) our continuous trait should have information about the evolutionary process for our discrete character as well.
In this case, recall, our generating transition rate (\(q\)) was equal to \(q = 0.2\).
q<-fit$rates[3]
q
## [1] 0.2069314
Finally, we can obtain our total likelihood in much the same way as we did before. In this case, it is the log probability density of our continuous trait and of our discrete character.
lik<-logLik(fit)-Ntip(tree)*log(dd/levs)
lik
## [1] -398.2394
## attr(,"df")
## [1] 3
(Ignore the degrees of freedom here – that’s not correct. Our fitted model actually has a total of four parameters, including the ancestral state of our continuous trait.)
We can contrast this fitted model with a discrete character state independent continuous trait evolution model.
Doing so using the same method would look as follows.
model<-matrix(0,ncol=ncol(eeX),nrow=ncol(eeX),
dimnames=list(nn,nn))
for(i in 1:(levs-1)){
model[i,i+1]<-1
model[i+1,i]<-1
model[i+levs,i+1+levs]<-1
model[i+1+levs,i+levs]<-1
model[i,i+levs]<-2
model[i+levs,i]<-2
}
library(plot.matrix)
par(fg="transparent")
plot(model,las=2,cex.axis=0.4,
col=c("#F0EAD6","purple","black"),
breaks=seq(0,3,by=1),main="structure of model")
par(fg="black")
Let’s fit it.
q.init<-runif(n=2,0,2)*c(
(1/2)*mean(pic(y,tree)^2)*(levs/dd)^2,
fitMk(tree,x)$rates)
fit_null<-fitMk(tree,eeX,model=model,
lik.func="parallel",
expm.method="R_Eigen",pi="mle",
rand_start=TRUE,
logscale=TRUE,q.init=q.init)
Once again, we can pull out our single value of \(\sigma^2\), our single transition rate between discrete states, and our log-likelihood as follows.
sig2<-2*fit_null$rates[1]*(dd/levs)^2
sig2
## [1] 9.829692
q<-fit_null$rates[2]
q
## [1] 0.1768809
lik<-logLik(fit_null)-Ntip(tree)*log(dd/levs)
lik
## [1] -419.1329
## attr(,"df")
## [1] 2
(Here, our fitted model actually has three parameters: \(x_{0}\), \(q\), and our single value of \(\sigma^2\).)
Uncoincidentally, these estimated values of \(\sigma^2\) and the discrete character transition rate (\(q\)) should closely resemble independent estimates that we might obtain using (for instance) geiger::fitContinuous
and phytools::fitMk
.
Let’s check to see if that’s true!
geiger::fitContinuous(tree,y)$opt$sigsq
## [1] 9.913359
phytools::fitMk(tree,x)$rates
## [1] 0.1829259
(If they’re a little bit different, it’s probably due to the coarseness of our discretization with levs = 100
. Try increasing levs
and see what happens! Just be forewarned – it will be slow.)
Likewise, our likelihood of the independent model is just the logged product of the separate likelihoods – which is equal to the sum of the logs, of course.
logLik(fitMk(tree,x))+logLik(geiger::fitContinuous(tree,y))
## [1] -418.6463
## attr(,"df")
## [1] 1
Finally, in a WhatsApp chat about this today, a friend & colleague (Luke Mahler) asked a question that made me realize that we could also fit a hidden-rates version of this model in which changes in the rate occur, but they’re not associated with any particular observed character trait.
To fit this model by adapting our scripts from above, we can start first by expanding our data matrix as follows. (I’m going to call my two different hidden discrete character levels 0
and 0*
.)
eeX<-cbind(X,X)
colnames(eeX)<-paste(colnames(eeX),
c(rep("0",levs),rep("0*",levs)),sep=",")
Now let’s proceed to our design matrix.
model<-matrix(0,ncol=ncol(eeX),nrow=ncol(eeX),
dimnames=list(colnames(eeX),colnames(eeX)))
for(i in 1:(levs-1)){
model[i,i+1]<-1
model[i+1,i]<-1
model[i+levs,i+1+levs]<-2
model[i+1+levs,i+levs]<-2
model[i,i+levs]<-3
model[i+levs,i]<-3
}
Finally, we’ll fit our model. (I’m just making up a starting value for transitions in our hidden character here, since it hasn’t been observed!)
q.init<-runif(n=3,0,2)*c(
rep((1/2)*mean(pic(y,tree)^2)*(levs/dd)^2,2),
1)
fit_hidden<-fitMk(tree,eeX,model=model,
lik.func="parallel",
expm.method="R_Eigen",pi="mle",
rand_start=TRUE,
logscale=TRUE,
q.init=q.init)
Now we can get our rates, \(\sigma^2\).
Of course, we should keep in mind that the order of these rates is arbitrary. (In our earlier analysis we expected to see them in the same order as our discrete trait, "0"
and "1"
.)
sig2<-2*fit_hidden$rates[1:2]*(dd/levs)^2
sig2
## [1] 1.090839 22.618986
I believe that we can even get an estimate of the transition rate for our unobserved discrete character! (I guess in theory this might be the rate of transition for the invisible character influencing our continuous character’s evolution.)
q<-fit_hidden$rates[3]
q
## [1] 0.1997306
Lastly, we can compute a log-likelihood.
lik<-logLik(fit_hidden)-Ntip(tree)*log(dd/levs)
lik
## [1] -337.4346
## attr(,"df")
## [1] 3
(This model has four estimated parameters, including the transition rate for the hidden trait.)
Now, this log-likelihood is not comparable to the log-likelihoods we calculated earlier since it doesn’t account for the probability of our discrete trait!
It is, on the other hand, comparable to a log-likelihood that we might obtain from (say) geiger::fitContinuous
.
logLik(geiger::fitContinuous(tree,y))
## [1] -351.7688
## attr(,"df")
## [1] 2
That’s all I have for now, folks – but there will be more to come… I promise!
I was trying to do this with a discrete character set including three ordered characters and a continuous character set with 120 levels. I keep getting NaN's for rates, with incredibly low likelihoods. I'm getting a warning message that NaNs are produced in log(comp[1:M + N]). I assume this means 'comp' is picking up NaNs or negative numbers somewhere. I've spent a couple hours trying to march through the code, but the ultimate issue seems to be buried in one of the optimization functions. Do you have any suggestions about what I could try to do to troubleshoot?
ReplyDeleteJust to be clear, are you using the code from this blog post -- or the implementation of this function in phytools: fitmultiBM?
DeleteWith fitmultiBM I get this error:
DeleteError in eigen(x, symmetric = isSym) : infinite or missing values in 'x'
which seems like it might be from a similar issue, but I haven't spent any time in the code for that function. I then tried recreating your analysis from first principles (I'm reasonably familiar with these functions and thought I could do it) and when that didn't work I tried following your instructions in this post step by step with my data (the example works just fine for me).
There are no missing values in either the continuous or discrete data, all of the tips match the tree. They work just fine as univariate analyses with fastAnc/boundedBM and fitMk.
If you haven't encountered this error no worries! But I figure you'd have encountered a lot of errors while testing code and might have a better sense of where I should start troubleshooting. Either way, thank you for your time!
If you're trying to get it to work without success, please feel free to send me an email. I'd like to help if I can!
Delete