Thursday, June 13, 2024

Fitting state-dependent & hidden-state models of multi-rate continuous character evolution using the discrete approximation in phytools

phytools has a new version on CRAN (phytools 2.3-0). The previous CRAN version (2.1-1) was published in January, and I’ve been doing a lot since then, so this version has many new features and methods. Hopefully, I’ll get around to surveying some of these in the coming days.

The function I’m most excited about right now, however, is unambiguously fitmultiBM for fitting a discrete character dependent multi-rate Brownian motion model to continuous trait data using the discrete approximation of Boucher & Démery (2016). I have already written about this method here, here, here, and here.

Today, before submitting the new phytools CRAN version, I first pushed a number of new updates to this function – to improve optimization (a bit – it’s still slow), to add ancestral state estimation under the model, and to create a visualization of the model structure.

I’m going to explore ancestral state estimation under this model with another post.

Here, I want to show a workflow of model comparison including visualization of the structure of the fitted model. Hopefully this will help users better understand how this method works.

library(phytools)
packageVersion("phytools")
## [1] '2.3.0'

To start with, I’m going to simulate a phylogeny and a total of four different characters.

My first character, y1, is a two-state discrete character with levels 1 and 2, that evolved on the tree under a continuous-time Markov chain (Mk) model.

My second character, x1, is a continuous trait that evolved via Brownian motion on the tree, but with a rate, \(\sigma^2\), that depends on the level of y1 such that \(\sigma_{1}^2 = 1.0\) and \(\sigma_{2}^2 = 20\).

My third character, y2, is a second continuous trait that evolved via the same process as y1, but with levels A and B.

Finally, my fourth trait, x2, is a continuous trait that evolved via Brownian motion with a rate that depended on y2 such that \(\sigma_{A}^2 = 20\) and \(\sigma_{B}^2 = 1.0\).

## simulate our tree
phy<-pbtree(n=100,scale=1)
phy
## 
## Phylogenetic tree with 100 tips and 99 internal nodes.
## 
## Tip labels:
##   t5, t59, t76, t77, t74, t75, ...
## 
## Rooted; includes branch lengths.
## our transition matrix for both y1 and y2
Q<-matrix(c(-1,1,1,-1),2,2,
  dimnames=list(1:2,1:2))
Q
##    1  2
## 1 -1  1
## 2  1 -1
## discrete character history for y1
t1<-sim.history(phy,Q)
## Done simulation(s).
## simulate y1 and x1
y1<-factor(getStates(t1,"tips"))
sig2_1<-setNames(c(1,20),levels(y1))
x1<-sim.rates(t1,sig2_1)
## update Q
colnames(Q)<-rownames(Q)<-c("A","B")
Q
##    A  B
## A -1  1
## B  1 -1
## discrete character history for y2
t2<-sim.history(phy,Q)
## Done simulation(s).
## simulate y2 and x2
y2<-factor(getStates(t2,"tips"))
sig2_2<-setNames(c(20,1),levels(y2))
x2<-sim.rates(t2,sig2_2)

To start off, I’ll focus on x1 and y1 and imagine that we had an a priori hypothesis that the state of y1 influenced the rate of x1.

We can fit that model as follows.

sd_fit1<-fitmultiBM(phy,x1,y1,levs=100,
  parallel=TRUE,plot_model=TRUE)

plot of chunk unnamed-chunk-11

The way to understand the graph of our fitted model is that every square is a levs \(\times\) levs submatrix. Those on the diagonal describe continuous trait evolution within a regime, whereas non-diagonal submatrices give the transitions in the discrete trait between regimes. Each different color corresponds to a different parameter to be estimated in the model.

sd_fit1
## Object of class "fitmultiBM" based on
##     a discretization with k = 100 levels.
## 
## Fitted multi-rate BM model parameters:
##  levels: [ 1, 2 ]
##   sigsq: [ 0.9299, 16.3279 ]
##      x0: -0.3036 
## 
## Estimated Q matrix:
##          1        2
## 1 -1.57855  1.57855
## 2  1.57855 -1.57855
## 
## Log-likelihood: -206.8113 
## 
## R thinks it has found the ML solution.

We might recall that our generating rates were \(\sigma^2 = [1, 20]\) and think that we might be onto something here!

On the other hand, we haven’t asked how well a constant rate model explains our data. Let’s fit that model and compare.

cr_null1<-fitmultiBM(phy,x1,y1,levs=100,
  parallel=TRUE,plot_model=TRUE,null_model=TRUE)

plot of chunk unnamed-chunk-13

We can see that this is a null model because the continuous trait evolution within each of our two discrete character levels are equal one to the other, as indicated by the colors in our matrix!

cr_null1
## Object of class "fitmultiBM" based on
##     a discretization with k = 100 levels.
## 
## Fitted multi-rate BM model parameters:
##  levels: [ 1, 2 ]
##   sigsq: [ 8.0242, 8.0242 ]
##      x0: 0.5535 
## 
## Estimated Q matrix:
##          1        2
## 1 -1.26584  1.26584
## 2  1.26584 -1.26584
## 
## Log-likelihood: -238.1674 
## 
## R thinks it has found the ML solution.

It’s not hard to see that, compared to our state-dependent model, a constant-rates process does not explain our data very well!

There’s another “null” model, however, that might be worth considering – and that’s one in which our evolutionary rates varies according to some discrete trait, just not the one we measured! This is a type of hidden Markov model and we can fit that using fitmultiBM too! Let’s try it.

hrm_null1<-fitmultiBM(phy,x1,y1,levs=100,
  parallel=TRUE,plot_model=TRUE,null_model=TRUE,
  ncat=2)

plot of chunk unnamed-chunk-15

Once again, the reason I call this a “null” (with air-quotes) model is because the rate of continuous trait evolution, \(\sigma^2\), is equal between our two different levels of the discrete trait (0 and 1) and only differ as a function of our hidden character (with two levels as indicated by the presence or absence of the *).

hrm_null1
## Object of class "fitmultiBM" based on
##     a discretization with k = 100 levels.
## 
## Fitted multi-rate BM model parameters:
##  levels: [ 1, 1*, 2, 2* ]
##   sigsq: [ 16.3279, 1.1006, 16.3279, 1.1006 ]
##      x0: -0.0179 
## 
## Estimated Q matrix:
##            1        1*         2        2*
## 1  -2.477284  1.211444  1.265840  0.000000
## 1*  1.211444 -2.477284  0.000000  1.265840
## 2   1.265840  0.000000 -2.477284  1.211444
## 2*  0.000000  1.265840  1.211444 -2.477284
## 
## Log-likelihood: -219.8385 
## 
## R thinks it has found the ML solution.

As far as I can tell, this hidden-rates model does not seem to explain our data nearly as well as our original state-dependent model.

One last model we can consider is one in which both our discrete character and a hidden character affect the rate of evolution of our continuous trait. This model has the most parameters and well take the most time to fit, so let’s not delay!

sd_fit1.hrm<-fitmultiBM(phy,x1,y1,levs=100,
  parallel=TRUE,plot_model=TRUE,ncat=2)

plot of chunk unnamed-chunk-18

sd_fit1.hrm
## Object of class "fitmultiBM" based on
##     a discretization with k = 100 levels.
## 
## Fitted multi-rate BM model parameters:
##  levels: [ 1, 1*, 2, 2* ]
##   sigsq: [ 1.0501, 0.9254, 1.47, 16.3279 ]
##      x0: -0.3036 
## 
## Estimated Q matrix:
##              1          1*           2          2*
## 1  -1.62375085  0.04234617  1.58140468  0.00000000
## 1*  0.04234617 -1.62375085  0.00000000  1.58140468
## 2   1.58140468  0.00000000 -1.62375085  0.04234617
## 2*  0.00000000  1.58140468  0.04234617 -1.62375085
## 
## Log-likelihood: -206.7758 
## 
## R thinks it has found the ML solution.

Once again, this model has a slightly higher log-likelihood (but just barely) compared to our original state-dependent model. This makes sense, because in our generating model x1 depended only on y1 and not on some unobserved factor.

Let’s pit all four models against each other.

anova(cr_null1,sd_fit1,hrm_null1,sd_fit1.hrm)
##                log(L) d.f.      AIC   weight
## cr_null1    -238.1674    3 482.3348 0.000000
## sd_fit1     -206.8113    4 421.6226 0.950944
## hrm_null1   -219.8385    5 449.6770 0.000001
## sd_fit1.hrm -206.7758    7 427.5516 0.049056

This affirms our impression that sd_fit1, our original state-dependent model, is best-supported for these data.

How about x2 and y2?

Actually, the reason I simulated these data as well was so I could demonstrate what happens when our continuous trait has evolved in a discrete state-dependent fashion, just not in response to the trait we hypothesized.

To see that, let’s fit a model in which the rate of evolution of x2 depends not on y2, but on y1, then on a hidden character, then on both y1 and a hidden character, and compare the results. This time I’m going to start with our “null” models. (This time we won’t plot our models.)

## constant rate null model
cr_null2<-fitmultiBM(phy,x2,y1,levs=100,
  parallel=TRUE,null_model=TRUE)
cr_null2
## Object of class "fitmultiBM" based on
##     a discretization with k = 100 levels.
## 
## Fitted multi-rate BM model parameters:
##  levels: [ 1, 2 ]
##   sigsq: [ 11.2695, 11.2695 ]
##      x0: -0.1364 
## 
## Estimated Q matrix:
##           1         2
## 1 -1.265837  1.265837
## 2  1.265837 -1.265837
## 
## Log-likelihood: -254.5506 
## 
## R thinks it has found the ML solution.
## hidden-state dependent null
hrm_null2<-fitmultiBM(phy,x2,y1,levs=100,
  parallel=TRUE,null_model=TRUE,ncat=2)
hrm_null2
## Object of class "fitmultiBM" based on
##     a discretization with k = 100 levels.
## 
## Fitted multi-rate BM model parameters:
##  levels: [ 1, 1*, 2, 2* ]
##   sigsq: [ 16.6849, 0.9114, 16.6849, 0.9114 ]
##      x0: 0.7301 
## 
## Estimated Q matrix:
##             1         1*          2         2*
## 1  -1.9760868  0.7102485  1.2658383  0.0000000
## 1*  0.7102485 -1.9760868  0.0000000  1.2658383
## 2   1.2658383  0.0000000 -1.9760868  0.7102485
## 2*  0.0000000  1.2658383  0.7102485 -1.9760868
## 
## Log-likelihood: -227.4381 
## 
## R thinks it has found the ML solution.
## state-dependent model
sd_fit2<-fitmultiBM(phy,x2,y1,levs=100,
  parallel=TRUE)
sd_fit2
## Object of class "fitmultiBM" based on
##     a discretization with k = 100 levels.
## 
## Fitted multi-rate BM model parameters:
##  levels: [ 1, 2 ]
##   sigsq: [ 16.6849, 1.0642 ]
##      x0: 0.7301 
## 
## Estimated Q matrix:
##           1         2
## 1 -1.826309  1.826309
## 2  1.826309 -1.826309
## 
## Log-likelihood: -247.6183 
## 
## R thinks it has found the ML solution.
## both hidden-state and character-dependent
sd_fit2.hrm<-fitmultiBM(phy,x2,y1,levs=100,
  parallel=TRUE,ncat=2)
sd_fit2.hrm
## Object of class "fitmultiBM" based on
##     a discretization with k = 100 levels.
## 
## Fitted multi-rate BM model parameters:
##  levels: [ 1, 1*, 2, 2* ]
##   sigsq: [ 16.6849, 0.9176, 13.6104, 0.8812 ]
##      x0: 0.7301 
## 
## Estimated Q matrix:
##            1        1*         2        2*
## 1  -1.973112  0.709367  1.263745  0.000000
## 1*  0.709367 -1.973112  0.000000  1.263745
## 2   1.263745  0.000000 -1.973112  0.709367
## 2*  0.000000  1.263745  0.709367 -1.973112
## 
## Log-likelihood: -227.3413 
## 
## R thinks it has found the ML solution.

Let’s compare this set of models.

anova(cr_null2,sd_fit2,hrm_null2,sd_fit2.hrm)
##                log(L) d.f.      AIC  weight
## cr_null2    -254.5506    3 515.1012 0.00000
## sd_fit2     -247.6183    4 503.2366 0.00000
## hrm_null2   -227.4381    5 464.8762 0.87025
## sd_fit2.hrm -227.3413    7 468.6826 0.12975

This time around we should see that our hidden-rates “null” model in which x2 evolves in a state-dependent variable-rate fashion, but just not in response to the observed discrete trait (y1) is the best-supported in the set. That makes sense.

Let’s take a closer look at this model.

hrm_null2
## Object of class "fitmultiBM" based on
##     a discretization with k = 100 levels.
## 
## Fitted multi-rate BM model parameters:
##  levels: [ 1, 1*, 2, 2* ]
##   sigsq: [ 16.6849, 0.9114, 16.6849, 0.9114 ]
##      x0: 0.7301 
## 
## Estimated Q matrix:
##             1         1*          2         2*
## 1  -1.9760868  0.7102485  1.2658383  0.0000000
## 1*  0.7102485 -1.9760868  0.0000000  1.2658383
## 2   1.2658383  0.0000000 -1.9760868  0.7102485
## 2*  0.0000000  1.2658383  0.7102485 -1.9760868
## 
## Log-likelihood: -227.4381 
## 
## R thinks it has found the ML solution.

Remember that our generating values of \(\sigma^2\) were \([1,20]\) (though they won’t necessarily appear in that order, of course), and the transition rate between character levels for the (unobserved) discrete trait was \(q = 1.0\).

How did we do?