Wednesday, July 2, 2025

More about a discrete trait dependent multi-trend trended Brownian motion model in phytools

Yesterday on this blog I posted about a new discrete character dependent multi-trend (and multi-rate) trended Brownian motion evolution model. (As a sidenote, I realized this evening that my simulator function of yesterday contained a small error – but that has been fixed for the current post.)

This is a model in which both the variance of our stochastic diffusion process, \(\sigma^2\), and it’s mean, \(\mu\), are allowed to vary on the tree as a function of a discrete character, itself evolving via a continuous time Markov chain.

Just to be clear what this means, \(\mu = 0\) corresponds to Brownian motion where there’s no tendency for the trait to evolve towards larger or smaller values over time, though the variance between diverging lineages tends to increase linearly. By contrast, \(\mu \neq 0\) means that our stochastic process will trend towards larger or smaller values of the trait, depending on the sign of \(\mu\).

To fit this model, I’m using the discretized diffusion approximation of Boucher & Démery (2016), as described in numerous prior posts to this blog.

The function is now in phytools on GitHub and can be obtained by updating phytools (e.g.) using the remotes package.

I thought it might be interesting to see if we could distinguish between two alternative models for trait evolution, when the data were generated with both discrete character dependent variation in \(\sigma^2\) and \(\mu\). In this case, I’ll set \(\sigma^2 = [0.5, 2.0]\) and \(\mu = [1.0, -0.5]\) for the two different levels of my discrete trait. To do the simulation, I’ll use sim.multiTrend which I posted along with the model-fitting function yesterday, and is now in the phytools development version available from GitHub.

## load packages
library(phytools)
packageVersion("phytools")
## [1] '2.4.21'
## simulate a tree
tree<-rtree(n=200)

Just for visualization purposes only, I’m going to add a bunch of unbranching nodes to my tree. This just facilitates plotting of our trended stochastic diffusion process so that we get a better sense of what that looks like!

simt<-map.to.singleton(make.era.map(tree,
  seq(0,max(nodeHeights(tree)),length.out=201)[1:200]))

Next, I’m going to simulate a discrete character history on the tree. This history will be used for simulation, but when we fit our model all that goes in are the observed states of our discrete and continuous trait, and not the full, generating trait history.

## Q matrix for simulating discrete trait
Q<-matrix(c(-0.5,0.5,0.5,-0.5),2,2,
  dimnames=list(letters[1:2],letters[1:2]))
## simulated discrete trait history
simt<-sim.history(simt,Q)
## Done simulation(s).

Now we’re ready to proceed and simulate our continuous character. Let’s do it!

par(mfrow=c(1,2),las=1)
cols<-setNames(palette()[c(2,4)],letters[1:2])
plot(simt,colors=cols,ftype="off",
  mar=c(5.1,4.1,1.1,1.1))
axis(1,at=seq(0,10,by=2))
title(xlab="time")
sig2<-setNames(c(0.5,2),letters[1:2])
mu<-setNames(c(1.0,-0.5),letters[1:2])
x<-sim.multiTrend(simt,sig2=sig2,mu=mu,plot=TRUE,
  col=palette()[c(2,4)])
legend("topleft",
  c(expression(paste(sigma^2," = 0.5;  ",
      mu," = 1.0")),
    expression(paste(sigma^2," = 2.0;  ",
      mu," = -0.5"))),
  col=palette()[c(2,4)],lwd=2,cex=0.8,
  bty="n")

plot of chunk unnamed-chunk-13

The only step that remains before we proceed and fit our model is pulling the tip states off our simulated discrete trait history as follows.

y<-as.factor(getStates(simt,"tips"))

Let’s start by fitting a multi \(\sigma^2\) (i.e., rate-varying) model using fitmultiBM.

fit_bm<-fitmultiBM(tree,x,y,parallel=TRUE,levs=100,
  plot_model=TRUE)

plot of chunk unnamed-chunk-15

fit_bm
## Object of class "fitmultiBM" based on
##     a discretization with k = 100 levels.
## 
## Fitted multi-rate BM model parameters:
##  levels: [ a, b ]
##   sigsq: [ 0.9326, 2.3395 ]
##      x0: 0.5712 
## 
## Estimated Q matrix:
##            a          b
## a -0.5479776  0.5479776
## b  0.5479776 -0.5479776
## 
## Log-likelihood: -474.242 
## 
## R thinks it has found the ML solution.

We can now compare that to a multi-rate trended model, in which both \(\sigma^2\) and \(\mu\) can vary as a function of our discrete trait. Note that this model is presently very slow to fit to data!

fit_trend<-fitmultiTrend(tree,x,y,parallel=TRUE,
  levs=100,plot_model=TRUE)

plot of chunk unnamed-chunk-17

fit_trend
## Object of class "fitmultiTrend" based on
##     a discretization with k = 100 levels.
## 
## Fitted multi-rate trend model parameters:
##  levels: [ a, b ]
##   sigsq: [ 0.5299, 2.0084 ]
##   mu: [ 1.2052, -0.5487 ]
##      x0: -0.438 
## 
## Estimated Q matrix:
##            a          b
## a -0.5253209  0.5253209
## b  0.5253209 -0.5253209
## 
## Log-likelihood: -453.9515 
## 
## R thinks it has found the ML solution.
anova(fit_bm,fit_trend)
##              log(L) d.f.     AIC       weight
## fit_bm    -474.2420    4 956.484 1.139038e-08
## fit_trend -453.9515    6 919.903 1.000000e+00

That’s it, folks! Very cool.

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.