Wednesday, June 12, 2024

Fitting a multi-rate Brownian motion model with hidden states using the discrete approximation

Over recent days (1, 2, 3) I’ve been posting about fitting a discrete character dependent multi-rate Brownian continuous trait evolution model using the discrete approximation of Boucher & Démery (2016).

For those unfamiliar with the Boucher & Démery approximation, the best posts from this blog to introduce it are probably here and here.

In brief, Boucher & D̩mery (2016), in an article ostensibly about fitting a model of bounded Brownian evolution, identified a clear and incredibly useful equivalency between an ordered & symmetric Mk (i.e., continuous time Markov chain) stochastic process & Brownian motion Рas the number of states in the chain goes \(\infty\).

Vitally, they recognized how this equivalency could be exploited to fit certain continuous trait models (bounded Brownian evolution in their study, but the application is much, much broader) for which analytic densities are unavailable. I refer to this as ‘the discrete approximation,’ but as I’ve noted in previous posts, this is only an “approximation” in the sense that it becomes exact in the limit as the bin width of our continuous trait discretization goes to zero.

Previously on this blog I’ve shown how this approximation can also be used for other continuous trait evolution models without an analytic density, including a multi-state threshold model and, more recently, a discrete character dependent multi-rate Brownian model (1, 2).

In one of the latter two posts I also pointed that this approach could be used to fit a type of hidden Markov model called a hidden-rate or hidden-state model (Beaulieu et al. 2013). Under this model, we assume that there is a discrete character evolving via an Mk process whose state influences the rate of evolution of our measured continuous trait. The only problem is, we haven’t observed the discrete trait – it’s “hidden” from our view!

In this post, I’m going to demonstrate fitting this category of trait evolution model for an unobserved character with 1, 2, and 3 hidden levels. Before, I do, however, let’s simulate some rate-heterogeneous continuous trait data for me to use in the demo.

library(phytools)
packageVersion("phytools")
## [1] '2.2.14'

To follow along, it’s important that you have this version of phytools or higher.

My simulated tree is a pure-birth (Yule) phylogeny from phytools::pbtree with 250 tips and a total depth of 2.0.

tree<-pbtree(n=250,scale=2)

Now I’ll generate a stochastic discrete character history of rate heterogeneity using phytools::sim.history under an equal-rates model with a transition rate between states equal to \(q = 0.8\), as follows.

q<-0.8
Q<-q*matrix(c(
  -1,1,
  1,-1),2,2,
  dimnames=list(c("a","b"),
    c("a","b")))
Q
##      a    b
## a -0.8  0.8
## b  0.8 -0.8
tt<-sim.history(tree,Q,anc="a")
## Done simulation(s).
cols<-setNames(hcl.colors(n=2),c("a","b"))
plot(tt,cols,ftype="off",type="arc",
  arc_height=0.2,lwd=3)
legend("bottomright",c("a","b"),lwd=3,
  col=cols,bty="n",horiz=TRUE)

plot of chunk unnamed-chunk-7

Remember that although we’ve generated a discrete character that we can observe, when we go to fit our model we’ll be pretending that this trait is unknown and has not been observed at all!

Got it?

Next let’s simulate our continuous data evolving with rates \(\sigma_{a}^2 = 1.0\), and \(\sigma_{b}^2 = 10\) using phytools::sim.rates as follows.

sig2<-setNames(c(1,10),c("a","b"))
sig2
##  a  b 
##  1 10
x<-sim.rates(tt,sig2)

I’m ready to fit my hidden-state models. For the first of these, I’ll assume that our hidden character has only one state. This, in fact, is just a rate homgeneous Brownian model. The reason I’m fitting it here is merely to show that it gives an equivalent likelihood and set of parameter estimates to geiger::fitContinuous.

Entirely for computational convenience, I’m setting levs = 100 (100 levels of discretization). This is quite bit low, IMO; however, for higher numbers of hidden states calculation of the likelihood already is very slow.

I also set the optional argument plot_model=TRUE. This merely has the effect of graphing the internal structure of our underlying discretized model. (I built this for debugging, but is also useful for understanding the conceptual underpinnings of the function.)

hrm1_fit<-fitmultiBM(tree,x,plot_model=TRUE,
  parallel=TRUE,levs=100)

plot of chunk unnamed-chunk-10

Let’s inspect the fitted model object.

hrm1_fit
## Object of class "fitmultiBM" based on
##     a discretization with k = 100 levels.
## 
## Fitted multi-rate BM model parameters:
##  levels: [ 0 ]
##   sigsq: [ 5.3532 ]
##      x0: 0.5345 
## 
## Estimated Q matrix:
##   0
## 0 0
## 
## Log-likelihood: -399.7652 
## 
## R thinks it has found the ML solution.

As mentioned above, the parameter estimates and log-likelihood should be equivalent (in the limit as levs goes to \(\infty\)) to a standard BM model fit using, say, geiger::fitContinuous. Let’s check to see if that’s true.

geiger::fitContinuous(tree,x)
## GEIGER-fitted comparative model of continuous data
##  fitted 'BM' model parameters:
## 	sigsq = 5.650227
## 	z0 = 0.534961
## 
##  model summary:
## 	log-likelihood = -400.865688
## 	AIC = 805.731376
## 	AICc = 805.779959
## 	free parameters = 2
## 
## Convergence diagnostics:
## 	optimization iterations = 100
## 	failed iterations = 0
## 	number of iterations with same best fit = 100
## 	frequency of best fit = 1.000
## 
##  object summary:
## 	'lik' -- likelihood function
## 	'bnd' -- bounds for likelihood search
## 	'res' -- optimization iteration summary
## 	'opt' -- maximum likelihood parameter estimates

These are pretty close. Remember, they’ll get closer if we increased levs (try it!), but watch out – computation gets even slower!

Now let’s proceed to our actual hidden-state models. To do this, we just need to increase the fitmultiBM function argument ncat as follows.

hrm2_fit<-fitmultiBM(tree,x,ncat=2,
  plot_model=TRUE,parallel=TRUE,levs=100)

plot of chunk unnamed-chunk-13

If you’re following along at home, you probably already see a pretty substantive impact on computation time. This is because even for levs = 100, by increasing ncat we’ve effectively doubled the dimensionality of our matrix!

hrm2_fit
## Object of class "fitmultiBM" based on
##     a discretization with k = 100 levels.
## 
## Fitted multi-rate BM model parameters:
##  levels: [ 0, 0* ]
##   sigsq: [ 1.4066, 9.8594 ]
##      x0: 0.8137 
## 
## Estimated Q matrix:
##            0        0*
## 0  -0.715646  0.715646
## 0*  0.715646 -0.715646
## 
## Log-likelihood: -382.9489 
## 
## R thinks it has found the ML solution.

Remember, our true generating process (since we happen to know it) had three rates, so we might expect this most recent model to fit the best.

Now let’s go to ncat=3 and fit a model with three hidden states.

hrm3_fit<-fitmultiBM(tree,x,ncat=3,
  plot_model=TRUE,parallel=TRUE,levs=100)

plot of chunk unnamed-chunk-15

hrm3_fit
## Object of class "fitmultiBM" based on
##     a discretization with k = 100 levels.
## 
## Fitted multi-rate BM model parameters:
##  levels: [ 0, 0*, 0** ]
##   sigsq: [ 10.6141, 1.3035, 7.7717 ]
##      x0: 0.8137 
## 
## Estimated Q matrix:
##              0         0*        0**
## 0   -0.7554945  0.3777472  0.3777472
## 0*   0.3777472 -0.7554945  0.3777472
## 0**  0.3777472  0.3777472 -0.7554945
## 
## Log-likelihood: -382.3172 
## 
## R thinks it has found the ML solution.

(We could go on to ncat = 4, ncat = 5, and so on – though our computations will slow down in turn!)

We should always see that our log-likelihood has increased as we move from one model to the next. This makes sense as each more complex model has all of the simpler ones as special cases. However, to compare among models we should take their parameter complexity into account.

We can do that with AIC.

anova(hrm1_fit,hrm2_fit,hrm3_fit)
##             log(L) d.f.      AIC       weight
## hrm1_fit -399.7652    2 803.5303 2.172707e-07
## hrm2_fit -382.9489    4 773.8977 5.910595e-01
## hrm3_fit -382.3172    5 774.6344 4.089403e-01

Hopefully, this comparison reveals that (after accounting for parameterization) the best-supported model is a hidden-rate model with two rate levels for our continuous trait.

Let’s take a final look at that model.

hrm2_fit
## Object of class "fitmultiBM" based on
##     a discretization with k = 100 levels.
## 
## Fitted multi-rate BM model parameters:
##  levels: [ 0, 0* ]
##   sigsq: [ 1.4066, 9.8594 ]
##      x0: 0.8137 
## 
## Estimated Q matrix:
##            0        0*
## 0  -0.715646  0.715646
## 0*  0.715646 -0.715646
## 
## Log-likelihood: -382.9489 
## 
## R thinks it has found the ML solution.

Remember, our generating rates were \(\sigma^2 = [1.0, 10]\) (though we should not necessarily expect to see them in that order in our fitted model parameter values) and our generating transition rate among states (discrete states that we didn’t even observe, don’t forget) was \(q = 0.8\).

How close did we come?

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.