Sunday, August 18, 2024

Fitting a multi-rate discrete state dependent continuous character evolution model for ordered discrete character evolution using fitmultiBM

A phytools user recently commented the following on my blog:

“I was trying to [fit a multi-rate discrete character dependent continuous trait evolution model] with a discrete character set including three ordered characters and a continuous character set with 120 levels. I keep getting NaN‘s for rates, with incredibly low likelihoods. I’m getting a warning message that NaNs are produced in log(comp[1:M + N]). I assume this means comp is picking up NaNs or negative numbers somewhere. I’ve spent a couple hours trying to march through the code, but the ultimate issue seems to be buried in one of the optimization functions. Do you have any suggestions about what I could try to do to troubleshoot?”

Unfortunately, I don’t know if this will solve the user’s problem (log(comp[1:M + N]) is usually non-fatal); however, I thought I’d make a very quick demo of fitting a multi-rate discrete character dependent continuous trait evolution model using the discrete approximation of Boucher & Démery (2016), as documented previously on this blog here, here, here, here, here, here, and here – but specifically for an ordered discrete trait.

Let’s begin by simulating a discrete & continuous trait evolving by our hypothesized process.

That’s pretty easy using phytools.

## load packages
library(phytools)

I’ll begin by simulating a stochastic, pure-birth tree.

## simulate tree
tree<-pbtree(n=100,scale=1)

Then, I set my transition matrix, Q, for the discrete character in our model.

## simulate ordered discrete trait history
Q<-matrix(c(-1,1,0,1,-2,1,0,1,-1),3,3,
  dimnames=list(letters[1:3],letters[1:3]))
Q
##    a  b  c
## a -1  1  0
## b  1 -2  1
## c  0  1 -1

Next, I simulate a discrete character history on this tree using phytools::sim.history.

tt<-sim.history(tree,Q)
## Done simulation(s).
cols<-setNames(hcl.colors(n=3,palette="plasma",
  rev=FALSE),letters[1:3])
plot(tt,cols,direction="upwards",ftype="off")

plot of chunk unnamed-chunk-17

Then, I can proceed to simulate a multi-rate Brownian process on this stochastic history in which the rate of evolution of our continuous trait, \(\sigma^2\), depends on the state of our discrete trait. Here, I’ll simulate \(\sigma^2 = [0.5, 5, 20]\) for our three rates.

## simulate multi-rate Brownian evolution
sig2<-setNames(c(0.5,5,20),letters[1:3])
sig2
##    a    b    c 
##  0.5  5.0 20.0
x<-sim.rates(tt,sig2,internal=TRUE)

(I left the internal node values in the simulation so that it would be easier to visualize our traits evolution – but, of course, we’ll remove them for our analysis!)

This is evolution of the continuous character (on the right) given our discrete character history (on the left).

par(mfrow=c(1,2))
plot(tt,cols,ftype="off",mar=c(5.1,1.1,1.1,1.1))
phenogram(tt,x,colors=cols,ftype="off",las=1,
  cex.axis=0.8,cex.lab=0.9)
legend("bottomleft",
  c(expression(paste(sigma^2," = 0.5")),
    expression(paste(sigma^2," = 5.0")),
    expression(paste(sigma^2," = 20"))),
  lwd=2,col=cols,bty="n",cex=0.8)

plot of chunk unnamed-chunk-20

Next, I’ll pull my tip discrete trait off the tips of the tree tt, and the tip values of the continuous character out of x.

## pull traits out for analysis
x<-x[tree$tip.label]
y<-as.factor(getStates(tt,"tips"))

Now we’re ready to set up our model. The first step, for a custom transition model of the discrete character, is to create a design matrix for the evolutionary process. This is much the same as what we’d do for phytools::fitMk or geiger::fitDiscrete. To start, I’m going to build one describing an ordered evolutionary process for the trait, but with a constant, equal transition rate between all levels of the character.

MODEL<-matrix(c(0,1,0,1,0,1,0,1,0),3,3)
dimnames(MODEL)<-dimnames(Q)
MODEL
##   a b c
## a 0 1 0
## b 1 0 1
## c 0 1 0

Let’s fit our multi-rate model using fitmultiBM.

fit_er<-fitmultiBM(tree,x,y,model=MODEL,levs=100,
  parallel=TRUE,ncores=10)
## This is the design of your custom discrete-trait model:
##   a b c
## a 0 1 0
## b 1 0 1
## c 0 1 0
fit_er
## Object of class "fitmultiBM" based on
##     a discretization with k = 100 levels.
## 
## Fitted multi-rate BM model parameters:
##  levels: [ a, b, c ]
##   sigsq: [ 0.461, 4.9598, 24.1272 ]
##      x0: 0.1808 
## 
## Estimated Q matrix:
##           a         b         c
## a -1.383048  1.383048  0.000000
## b  1.383048 -2.766096  1.383048
## c  0.000000  1.383048 -1.383048
## 
## Log-likelihood: -183.2667 
## 
## R thinks it has found the ML solution.

Inspection of the fitted model shows really how close we came to the generating parameter values, which is pretty cool. Remember, we don’t know the real history of our discrete trait on the tree.

This was actually the generating scenario for our data – the rates of transition between all levels of the discrete character were, in fact, equal. Let’s also try fitting two other evolutionary scenarios for our discrete trait: an ordered symmetric model, and an ordered, all-rates-different model.

This is how that looks.

## set up symmetric ordered model
MODEL<-matrix(c(0,1,0,1,0,2,0,2,0),3,3)
dimnames(MODEL)<-dimnames(Q)
MODEL
##   a b c
## a 0 1 0
## b 1 0 2
## c 0 2 0
## fit symmetric ordered model
fit_sym<-fitmultiBM(tree,x,y,model=MODEL,levs=100,
  parallel=TRUE,ncores=10)
## This is the design of your custom discrete-trait model:
##   a b c
## a 0 1 0
## b 1 0 2
## c 0 2 0
fit_sym
## Object of class "fitmultiBM" based on
##     a discretization with k = 100 levels.
## 
## Fitted multi-rate BM model parameters:
##  levels: [ a, b, c ]
##   sigsq: [ 0.4445, 4.9576, 24.0432 ]
##      x0: 0.1808 
## 
## Estimated Q matrix:
##          a         b         c
## a -1.60397  1.603970  0.000000
## b  1.60397 -2.520176  0.916206
## c  0.00000  0.916206 -0.916206
## 
## Log-likelihood: -182.7138 
## 
## R thinks it has found the ML solution.
## set up all-rates-different ordered model
MODEL<-matrix(c(0,1,0,2,0,3,0,4,0),3,3)
dimnames(MODEL)<-dimnames(Q)
MODEL
##   a b c
## a 0 2 0
## b 1 0 4
## c 0 3 0
## fit all-rates-different ordered model
fit_ard<-fitmultiBM(tree,x,y,model=MODEL,levs=100,
  parallel=TRUE,ncores=10)
## This is the design of your custom discrete-trait model:
##   a b c
## a 0 2 0
## b 1 0 4
## c 0 3 0
fit_ard
## Object of class "fitmultiBM" based on
##     a discretization with k = 100 levels.
## 
## Fitted multi-rate BM model parameters:
##  levels: [ a, b, c ]
##   sigsq: [ 0.4428, 5.1839, 24.3979 ]
##      x0: 0.1808 
## 
## Estimated Q matrix:
##           a          b          c
## a -1.567843  1.5678435  0.0000000
## b  1.834458 -2.8536055  1.0191477
## c  0.000000  0.3534697 -0.3534697
## 
## Log-likelihood: -182.2811 
## 
## R thinks it has found the ML solution.

Lastly, let’s compare the three using an anova generic method call, as follows.

anova(fit_er,fit_sym,fit_ard)
##            log(L) d.f.      AIC     weight
## fit_er  -183.2667    5 376.5334 0.56405896
## fit_sym -182.7138    6 377.4276 0.36069428
## fit_ard -182.2811    8 380.5621 0.07524676

That’s all for now, folks!

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.