Recently, I’ve been somewhat *obsessed* with an underappreciated article published by Boucher & DÃ©mery in 2016.

I’ve described this article as “the most important phylogenetic comparative methods paper you’ve probably never read,” and I’m scheduled to give a talk of the same name at this summer’s Joint Congress on Evolutionary Biology in Montreal, Canada.

The Boucher & DÃ©mery article is about an evolutionary model (bounded Brownian motion) – but in my opinion what’s *really cool* about it is the authors’ identification of the equivalency (in the limit as the step size approaches zero) of Brownian motion and an ordered, symmetric, continuous time Markov chain (M*k*) model.

Their *revolutionary* (my words!) assertion is that we should be able to exploit this equivalency to estimate the probability density of continuous data (and thus the likelihood of our model) from a discretization of the state space under circumstances (i.e., models) in which an analytic probability density function in the original continuous space is unavailable, Brownian motion with reflective bounds being just such a case. Indeed, Boucher et al. (2018) follow up on their 2016 study to illustrate this for a range of other models.

I’ve been referring to Boucher & DÃ©mery’s innovation as “the discrete approximation;” however, it’s worth noting that this is *only an approximation* in the sense that it becomes exact in the limit as our bin widths for discretization shrink to zero! In other words, this isn’t a hack to get around the problem of not being able to calculate the likelihood – it’s a computational solution to calculate the likelihood exactly in the limit as our bin widths shrink.

Ever since I *finally* appreciated the significance of this equivalency, I’ve been working on other applications.

One of these that I posted last month was for a multi-state threshold model.

Another that I’ve been thinking about for a while but that I finally began working on today is joint discrete and continuous character estimation under state-dependent rate heterogeneous trait evolution model Ã¡ la O’Meara et al. (2006). A key difference is that because we’re going to estimate the joint evolution of our discrete and continuous traits, there’s no need to “map” hypotheses of our discrete character’s history onto the tree, e.g., using stochastic mapping.

For this analysis, our model matrix is going to be \(k \cdot n \times k \cdot n\) in size (for \(k\) levels of our discrete trait and \(n\) bins for our continuous character).

This model matrix would have symmetric sub- and super-diagonals for each of the two diagonal \(n \times n\) submatrices, and then the other non-diagonal \(n \times n\) submatrices would each be sparse and contain only a superdiagonal or subdiagonal with the transition rates between discrete character levels.

I haven’t yet written this up into a *phytools* function, but I already understand how to script it – so I’m going to show that first for a binary trait that influences the rate of evolution of a continuous character. Note that this model *easily* generalizes to an arbitrary number of levels of a discrete character, as well as any transition model (for that character) between levels!

Let’s gin up exactly this evolutionary scenario via simulation.

```
library(phytools)
```

```
## simulate a tree
tree<-pbtree(n=120,scale=10)
## discrete character matrix
Q<-0.2*matrix(c(-1,1,1,-1),2,2,dimnames=list(0:1,0:1))
Q
```

```
## 0 1
## 0 -0.2 0.2
## 1 0.2 -0.2
```

```
## simulate discrete character history on tree
tt<-sim.history(tree,Q)
```

```
## Done simulation(s).
```

```
x<-as.factor(getStates(tt,"tips"))
sig2<-setNames(c(1,20),0:1)
y<-sim.rates(tt,sig2)
```

```
layout(matrix(c(1,2),1,2),widths=c(0.4,0.6))
cols<-setNames(c("blue","red"),0:1)
plot(tt,cols,ftype="off",mar=c(4.1,1.1,1.1,1.1))
par(mar=c(4.1,5.1,1.1,1.1))
phenogram(tt,y,colors=make.transparent(cols,0.5),
ftype="off",las=1)
```

Now let’s go ahead and build & fit our model using the discrete approximation of Boucher & DÃ©mery.

To do this, I’m going to use some internal *phytools* functions that I coded for `bounded_bm`

.

All of the following are preliminaries to that.

```
## set number of levels for discretization
levs<-100
## set range of continuous trait (far outside observed data)
lims<-phytools:::expand.range(y)
dd<-diff(lims)
tol<-1e-8*dd/levs
## calculate bins
bins<-cbind(seq(from=lims[1]-tol,by=(dd+2*tol)/levs,
length.out=levs),seq(to=lims[2]+tol,by=(dd+2*tol)/levs,
length.out=levs))
## bin discrete data
X<-phytools:::to_binned(y,bins)
## identify all combinations of discrete character
## and binned continuous trait
nn<-c(paste(levels(x)[1],colnames(X),sep=","),
paste(levels(x)[2],colnames(X),sep=","))
## create a new, larger N x n*k data matrix
eeX<-matrix(0,nrow=nrow(X),ncol=2*levs,
dimnames=list(rownames(X),nn))
for(i in 1:nrow(X))
eeX[i,1:levs+(as.numeric(x)[i]-1)*levs]<-X[i,]
```

Now, let’s create our model design matrix.

Remember, it should be of dimensions \(k \cdot n \times k \cdot n\) for \(k\) levels of our discrete character and \(n\) bins (`levs`

in the code above).

```
model<-matrix(0,ncol=ncol(eeX),nrow=ncol(eeX),
dimnames=list(nn,nn))
for(i in 1:(levs-1)){
model[i,i+1]<-1
model[i+1,i]<-1
model[i+levs,i+1+levs]<-2
model[i+1+levs,i+levs]<-2
model[i,i+levs]<-3
model[i+levs,i]<-3
}
```

Just to visually see the structure of this model, let’s graph it. For this, you need the simple CRAN *plot.matrix* package.

```
library(plot.matrix)
par(fg="transparent")
plot(model,las=2,cex.axis=0.4,
col=c("#F0EAD6","blue","red","black"),
breaks=seq(0,4,by=1),main="structure of model")
```

```
par(fg="black")
```

Just to get a sense of what’s going on, in our design matrix the blue cells are transitions in the (discretized) continuous space when the discrete character is in condition `"0"`

; red cells are the same, but when the discrete character is in condition `"1"`

; finally, the black cells show transitions for the discrete trait, which I assume to be symmetric. (That’s what we simulated.)

Now let’s optimize our model.

I promise that it’ll be useful to start from reasonable parameter values! (I think where those come from is pretty self-explanatory.)

```
q.init<-runif(n=3,0,2)*c(
rep((1/2)*mean(pic(y,tree)^2)*(levs/dd)^2,2),
fitMk(tree,x)$rates)
fit<-fitMk(tree,eeX,model=model,
lik.func="parallel",
expm.method="R_Eigen",pi="mle",
rand_start=TRUE,
logscale=TRUE,
q.init=q.init)
```

Having fit our model, we’re still not done.

We can borrow the equation from my prior post on this subject to transform our transition rates in the discrete space to (in this case) our two different, state-dependent Brownian \(\sigma^2\) values.

Remember, the generating conditions of our simulation were \(\sigma^2 = [1, 20]\) for the two different discrete levels (0 and 1) of our trait.

```
sig2<-2*fit$rates[1:2]*(dd/levs)^2
sig2
```

```
## [1] 1.386611 20.980551
```

Again, we can also get our jointly estimated transition rate between states.

In theory, I believe that this estimate should be more accurate that the one we might obtain by fitting the M*k* model to our discrete character data alone. This is because (if the rate of evolution in the continuous trait varies as a function of our discrete character, then) our continuous trait should have information about the evolutionary process for our discrete character as well.

In this case, recall, our generating transition rate (\(q\)) was equal to \(q = 0.2\).

```
q<-fit$rates[3]
q
```

```
## [1] 0.2069314
```

Finally, we can obtain our total likelihood in much the same way as we did before. In this case, it is the log probability density of our continuous trait *and* of our discrete character.

```
lik<-logLik(fit)-Ntip(tree)*log(dd/levs)
lik
```

```
## [1] -398.2394
## attr(,"df")
## [1] 3
```

(Ignore the degrees of freedom here – that’s not correct. Our fitted model actually has a total of *four* parameters, including the ancestral state of our continuous trait.)

We can contrast this fitted model with a discrete character state *independent* continuous trait evolution model.

Doing so using the same method would look as follows.

```
model<-matrix(0,ncol=ncol(eeX),nrow=ncol(eeX),
dimnames=list(nn,nn))
for(i in 1:(levs-1)){
model[i,i+1]<-1
model[i+1,i]<-1
model[i+levs,i+1+levs]<-1
model[i+1+levs,i+levs]<-1
model[i,i+levs]<-2
model[i+levs,i]<-2
}
library(plot.matrix)
par(fg="transparent")
plot(model,las=2,cex.axis=0.4,
col=c("#F0EAD6","purple","black"),
breaks=seq(0,3,by=1),main="structure of model")
```

```
par(fg="black")
```

Let’s fit it.

```
q.init<-runif(n=2,0,2)*c(
(1/2)*mean(pic(y,tree)^2)*(levs/dd)^2,
fitMk(tree,x)$rates)
fit_null<-fitMk(tree,eeX,model=model,
lik.func="parallel",
expm.method="R_Eigen",pi="mle",
rand_start=TRUE,
logscale=TRUE,q.init=q.init)
```

Once again, we can pull out our *single* value of \(\sigma^2\), our single transition rate between discrete states, and our log-likelihood as follows.

```
sig2<-2*fit_null$rates[1]*(dd/levs)^2
sig2
```

```
## [1] 9.829692
```

```
q<-fit_null$rates[2]
q
```

```
## [1] 0.1768809
```

```
lik<-logLik(fit_null)-Ntip(tree)*log(dd/levs)
lik
```

```
## [1] -419.1329
## attr(,"df")
## [1] 2
```

(Here, our fitted model actually has three parameters: \(x_{0}\), \(q\), and our single value of \(\sigma^2\).)

Uncoincidentally, these estimated values of \(\sigma^2\) and the discrete character transition rate (\(q\)) should closely resemble independent estimates that we might obtain using (for instance) `geiger::fitContinuous`

and `phytools::fitMk`

.

Let’s check to see if that’s true!

```
geiger::fitContinuous(tree,y)$opt$sigsq
```

```
## [1] 9.913359
```

```
phytools::fitMk(tree,x)$rates
```

```
## [1] 0.1829259
```

(If they’re a little bit different, it’s probably due to the coarseness of our discretization with `levs = 100`

. Try increasing `levs`

and see what happens! Just be forewarned – it will be slow.)

Likewise, our *likelihood* of the independent model is just the logged product of the separate likelihoods – which is equal to the sum of the logs, of course.

```
logLik(fitMk(tree,x))+logLik(geiger::fitContinuous(tree,y))
```

```
## [1] -418.6463
## attr(,"df")
## [1] 1
```

Finally, in a WhatsApp chat about this today, a friend & colleague (Luke Mahler) asked a question that made me realize that we could *also* fit a hidden-rates version of this model in which changes in the rate occur, but they’re not associated with any particular observed character trait.

To fit this model by adapting our scripts from above, we can start first by expanding our data matrix as follows. (I’m going to call my two different hidden discrete character levels `0`

and `0*`

.)

```
eeX<-cbind(X,X)
colnames(eeX)<-paste(colnames(eeX),
c(rep("0",levs),rep("0*",levs)),sep=",")
```

Now let’s proceed to our design matrix.

```
model<-matrix(0,ncol=ncol(eeX),nrow=ncol(eeX),
dimnames=list(colnames(eeX),colnames(eeX)))
for(i in 1:(levs-1)){
model[i,i+1]<-1
model[i+1,i]<-1
model[i+levs,i+1+levs]<-2
model[i+1+levs,i+levs]<-2
model[i,i+levs]<-3
model[i+levs,i]<-3
}
```

Finally, we’ll fit our model. (I’m just making up a starting value for transitions in our hidden character here, since it hasn’t been observed!)

```
q.init<-runif(n=3,0,2)*c(
rep((1/2)*mean(pic(y,tree)^2)*(levs/dd)^2,2),
1)
fit_hidden<-fitMk(tree,eeX,model=model,
lik.func="parallel",
expm.method="R_Eigen",pi="mle",
rand_start=TRUE,
logscale=TRUE,
q.init=q.init)
```

Now we can get our rates, \(\sigma^2\).

Of course, we should keep in mind that the *order* of these rates is arbitrary. (In our earlier analysis we expected to see them in the same order as our discrete trait, `"0"`

and `"1"`

.)

```
sig2<-2*fit_hidden$rates[1:2]*(dd/levs)^2
sig2
```

```
## [1] 1.090839 22.618986
```

I believe that we can even get an estimate of the transition rate for our unobserved discrete character! (I guess in theory this might be the rate of transition for the invisible character influencing our continuous character’s evolution.)

```
q<-fit_hidden$rates[3]
q
```

```
## [1] 0.1997306
```

Lastly, we can compute a log-likelihood.

```
lik<-logLik(fit_hidden)-Ntip(tree)*log(dd/levs)
lik
```

```
## [1] -337.4346
## attr(,"df")
## [1] 3
```

(This model has *four* estimated parameters, including the transition rate for the hidden trait.)

Now, this log-likelihood is *not* comparable to the log-likelihoods we calculated earlier since it doesn’t account for the probability of our discrete trait!

It *is*, on the other hand, comparable to a log-likelihood that we might obtain from (say) `geiger::fitContinuous`

.

```
logLik(geiger::fitContinuous(tree,y))
```

```
## [1] -351.7688
## attr(,"df")
## [1] 2
```

That’s all I have for now, folks – but there *will* be more to come… I promise!

## No comments:

## Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.