## Tuesday, January 2, 2024

### Mk vs. Γ vs. hidden-rates model for regime shifts on the tree in the evolution of a discrete phenotypic trait

Recently, I’ve been blogging about a new $$\Gamma$$ rate heterogeneity model for discrete phenotypic characters in the phytools package (e.g., 1, 2, 3, 4, 5). According to this model, edge rates for a discrete phenotypic trait vary according a discretized $$\Gamma$$ distribution with shape parameter $$\alpha$$. $$\beta = \alpha$$ so the $$\Gamma$$ mean is 1.0, and low values of $$\alpha$$ correspond to high rate heterogeneity among branches (& vice versa).

Something I thought might be interesting to explore would be to see how the model behaves when (instead of varying according a $$\Gamma$$ distribution) there are one or a small number of discrete shifts in the rate of evolution on certain pre-defined branches of the tree.

Here, I’m going to simulate exactly that type of rate variation for an ordered, 3-state character with two rate shifts. I’ll do it using phytools functions pbtree (to simulate the tree), paintSubTree (to paint our rate regimes), and sim.multiMk (to simulate our rates).

## load packages
library(phytools)
## set seed & simulate tree
set.seed(99)
tree<-pbtree(n=200,scale=1)
## paint regimes
sim_tree<-paintSubTree(tree,node=316,state="1",
anc.state="0",stem=0.5)
sim_tree<-paintSubTree(sim_tree,node=228,
state="1",stem=0.5)
## simulate trait with two Q-matrices
Q<-matrix(c(
-1,1,0,
1,-2,1,
0,1,-1),3,3,
dimnames=list(letters[1:3],letters[1:3]))
simQ<-setNames(list(0.5*Q,20*Q),0:1)
x<-sim.multiMk(sim_tree,simQ)


Let’s visualize this on our tree. I’m going to use a hack that you might learn by reading Chapter 13 of my recent book with Luke Harmon to map our discrete character at the tips of the tree.

cols<-setNames(c("blue","red"),0:1)
plot(sim_tree,cols,ftype="off",lwd=1,
direction="upwards")
pp<-get("last_plot.phylo",envir=.PlotPhyloEnv)
x_cols<-setNames(viridisLite::viridis(n=3),
levels(x))[x[tree$tip.label]] segments(x0=pp$xx[1:Ntip(tree)],x1=pp$xx[1:Ntip(tree)], y0=pp$yy[1:Ntip(tree)], y1=pp$yy[1:Ntip(tree)]+0.1,lwd=3, col=x_cols) legend("bottomleft",letters[1:3],lwd=3, col=viridisLite::viridis(n=3),bty="n")  To start, let’s fit a standard (homogeneous rate) Mk model to these data. I’m going to use the structure of my generating Q matrices, as follows. MODEL<-Q diag(MODEL)<-0 MODEL  ## a b c ## a 0 1 0 ## b 1 0 1 ## c 0 1 0  fit_mk<-fitMk(tree,x,model=MODEL,pi="fitzjohn", rand_start=TRUE) fit_mk  ## Object of class "fitMk". ## ## Fitted (or set) value of Q: ## a b c ## a -1.704026 1.704026 0.000000 ## b 1.704026 -3.408051 1.704026 ## c 0.000000 1.704026 -1.704026 ## ## Fitted (or set) value of pi: ## a b c ## 0.230061 0.504219 0.265720 ## due to treating the root prior as (a) nuisance. ## ## Log-likelihood: -192.777337 ## ## Optimization method used was "nlminb" ## ## R thinks it has found the ML solution.  Cool. OK, now we can compare this to our new $$\Gamma$$ model. Remember, our generating process is not $$\Gamma$$-distributed rates across edges! We’re just curious to see if the real process (discrete regime shifts) is captured at all by fitgammaMk. fit_gamma<-fitgammaMk(tree,x,model=MODEL,nrates=8, pi="fitzjohn",marginal=TRUE,rand_start=TRUE)  ## -- ## Computing marginal scaled likelihoods (in parallel) of each ## rate on each edge. Caution: this is NOT fast.... ## --  fit_gamma  ## Object of class "fitgammaMk". ## ## Fitted (or set) value of Q: ## a b c ## a -11.4076 11.4076 0.0000 ## b 11.4076 -22.8152 11.4076 ## c 0.0000 11.4076 -11.4076 ## ## Fitted (or set) value of alpha rate heterogeneity ## (with 8 rate categories): 0.102942 ## ## Fitted (or set) value of pi: ## a b c ## 0.303173 0.463638 0.233189 ## due to treating the root prior as (a) nuisance. ## ## Log-likelihood: -181.833211 ## ## Optimization method used was "nlminb" ## ## R thinks it has found the ML solution.  Clearly, this model fit much better than a homogenous rate Mk model, but let’s see if the marginal edge rates reflect what we know to be the true rate heterogeneity of our data. par(mfrow=c(1,2)) plot(sim_tree,cols,ftype="off",lwd=1) plot(fit_gamma,ftype="off")  Awesome. That’s really not bad. Of course, if we have a hypothesis of discrete regime shifts, but we don’t know where these might have occurred, a more appropriate model to fit would be the hidden-rates model of Beaulieu et al. (2013). In phytools, this can be doing using fitHRM. Unfortunately, fitHRM doesn’t allow us to specify a custom (e.g., ordered) transition model. Fortunately, it’s pretty easy to pull the design matrix out of a fitted model object & then update it to match our hypothesized Q matrix structure. Let’s do it. ## fit equal-rates HRM model (just to get object) fit_hrm<-fitHRM(tree,x,model="ER",ncat=2,pi="fitzjohn", corHMM_model=TRUE,niter=1)  ## ## This is the design matrix of the fitted model. ## Does it make sense? ## ## a a* b b* c c* ## a 0 3 1 0 1 0 ## a* 3 0 0 2 0 2 ## b 1 0 0 3 1 0 ## b* 0 2 3 0 0 2 ## c 1 0 1 0 0 3 ## c* 0 2 0 2 3 0 ## ## log-likelihood from current iteration: -160.9726 ## --- Best log-likelihood so far: -160.9726 ---  ## pull out model structure hrm_MODEL<-fit_hrm$index.matrix
hrm_MODEL

##    a a* b b* c c*
## a  0  3 1  0 1  0
## a* 3  0 0  2 0  2
## b  1  0 0  3 1  0
## b* 0  2 3  0 0  2
## c  1  0 1  0 0  3
## c* 0  2 0  2 3  0

## update model structure to be ordered a<->b<->c
## for each rate level
hrm_MODEL[1,5]<-hrm_MODEL[2,6]<-hrm_MODEL[5,1]<-
hrm_MODEL[6,2]<-0
hrm_MODEL

##    a a* b b* c c*
## a  0  3 1  0 0  0
## a* 3  0 0  2 0  0
## b  1  0 0  3 1  0
## b* 0  2 3  0 0  2
## c  0  0 1  0 0  3
## c* 0  0 0  2 3  0

## fit hidden-rates model using fitMk
fit_hrm<-fitMk(tree,fit_hrm$data,model=hrm_MODEL,pi="fitzjohn", rand_start=TRUE) fit_hrm  ## Object of class "fitMk". ## ## Fitted (or set) value of Q: ## a a* b b* c c* ## a -24.13886 0.109710 24.02915 0.000000 0.00000 0.000000 ## a* 0.10971 -0.437854 0.00000 0.328144 0.00000 0.000000 ## b 24.02915 0.000000 -48.16801 0.109710 24.02915 0.000000 ## b* 0.00000 0.328144 0.10971 -0.765997 0.00000 0.328144 ## c 0.00000 0.000000 24.02915 0.000000 -24.13886 0.109710 ## c* 0.00000 0.000000 0.00000 0.328144 0.10971 -0.437854 ## ## Fitted (or set) value of pi: ## a a* b b* c c* ## 0.012770 0.161929 0.010274 0.735779 0.008636 0.070611 ## due to treating the root prior as (a) nuisance. ## ## Log-likelihood: -156.858431 ## ## Optimization method used was "nlminb" ## ## R thinks it has found the ML solution.  OK, obviously this model fits way better – just as one might expect it would. But how do we map our rate regimes back onto the edges of the tree? The easiest way to do this (IMO) is to perform stochastic mapping on the fitted model, merge by rate regime level, and then visual our merged maps using phytools::densityMap. Let’s try it. ## perform stochastic mapping hrm_smap<-simmap(fit_hrm) hrm_smap  ## 100 phylogenetic trees with mapped discrete characters  hrm_smap<-mergeMappedStates(hrm_smap,c("a*","b*","c*"), "cold") hrm_smap<-mergeMappedStates(hrm_smap,c("a","b","c"), "warm")  ## computing "densityMap" object dmap<-densityMap(hrm_smap,plot=FALSE)  ## sorry - this might take a while; please be patient  ## finally, let's create our plot par(mfrow=c(1,2)) plot(sim_tree,cols,ftype="off",lwd=1) Nt<-Ntip(tree) h<-max(nodeHeights(tree)) plot(dmap,ftype="off",lwd=2,legend=FALSE, xlim=c(-0.3,1)) h <- max(nodeHeights(tree)) LWD <- diff(par()$usr[1:2])/dev.size("px")[1]
Nt <- Ntip(tree)
lines(x = rep(-0.25 * h + LWD * 15/2, 2), y = c(1 + 1/40 *
Nt, Nt - 1/40 * Nt))
nticks <- 11
Y <- cbind(seq(1 + 1/40 * Nt, Nt - 1/40 * Nt, length.out = nticks),
seq(1 + 1/40 * Nt, Nt - 1/40 * Nt, length.out = nticks))
X <- cbind(rep(-0.25 * h + LWD * 15/2, nticks), rep(-0.25 *
h + LWD * 15/2 + 0.02 * h, nticks))
for (i in 1:nrow(Y)) lines(X[i, ], Y[i, ])
x=-0.25*h,y=1+1/40*Nt,
title="probability of being in \"warm\" state",
lims=NULL,digits=3,direction="upwards",
subtitle="",lwd=15,prompt=FALSE)
ticks<-seq(0,1.0,length.out=11)
text(x = X[, 2], y = Y[, 2], signif(ticks, 2), pos = 4,
cex = 0.7)


Wow. The distribution of “warm” rates match our simulation conditions with near exact precision. Cool!

That’s all for now.