Recently, I’ve been blogging about a new \(\Gamma\) rate heterogeneity model for discrete phenotypic characters in the *phytools* package (e.g., 1, 2, 3, 4, 5). According to this model, edge rates for a discrete phenotypic trait vary according a discretized \(\Gamma\) distribution with shape parameter \(\alpha\). \(\beta = \alpha\) so the \(\Gamma\) mean is 1.0, and low values of \(\alpha\) correspond to high rate heterogeneity among branches (& *vice versa*).

Something I thought might be interesting to explore would be to see how the model behaves when (instead of varying according a \(\Gamma\) distribution) there are one or a small number of discrete shifts in the rate of evolution on certain pre-defined branches of the tree.

Here, I’m going to simulate exactly that type of rate variation for an ordered, 3-state character with two rate shifts. I’ll do it using *phytools* functions `pbtree`

(to simulate the tree), `paintSubTree`

(to paint our rate regimes), and `sim.multiMk`

(to simulate our rates).

```
## load packages
library(phytools)
## set seed & simulate tree
set.seed(99)
tree<-pbtree(n=200,scale=1)
## paint regimes
sim_tree<-paintSubTree(tree,node=316,state="1",
anc.state="0",stem=0.5)
sim_tree<-paintSubTree(sim_tree,node=228,
state="1",stem=0.5)
## simulate trait with two Q-matrices
Q<-matrix(c(
-1,1,0,
1,-2,1,
0,1,-1),3,3,
dimnames=list(letters[1:3],letters[1:3]))
simQ<-setNames(list(0.5*Q,20*Q),0:1)
x<-sim.multiMk(sim_tree,simQ)
```

Let’s visualize this on our tree. I’m going to use a hack that you might learn by reading Chapter 13 of my recent book with Luke Harmon to map our discrete character at the tips of the tree.

```
cols<-setNames(c("blue","red"),0:1)
plot(sim_tree,cols,ftype="off",lwd=1,
direction="upwards")
pp<-get("last_plot.phylo",envir=.PlotPhyloEnv)
x_cols<-setNames(viridisLite::viridis(n=3),
levels(x))[x[tree$tip.label]]
segments(x0=pp$xx[1:Ntip(tree)],x1=pp$xx[1:Ntip(tree)],
y0=pp$yy[1:Ntip(tree)], y1=pp$yy[1:Ntip(tree)]+0.1,lwd=3,
col=x_cols)
legend("bottomleft",letters[1:3],lwd=3,
col=viridisLite::viridis(n=3),bty="n")
```

To start, let’s fit a standard (homogeneous rate) M*k* model to these data. I’m going to use the structure of my generating **Q** matrices, as follows.

```
MODEL<-Q
diag(MODEL)<-0
MODEL
```

```
## a b c
## a 0 1 0
## b 1 0 1
## c 0 1 0
```

```
fit_mk<-fitMk(tree,x,model=MODEL,pi="fitzjohn",
rand_start=TRUE)
fit_mk
```

```
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## a b c
## a -1.704026 1.704026 0.000000
## b 1.704026 -3.408051 1.704026
## c 0.000000 1.704026 -1.704026
##
## Fitted (or set) value of pi:
## a b c
## 0.230061 0.504219 0.265720
## due to treating the root prior as (a) nuisance.
##
## Log-likelihood: -192.777337
##
## Optimization method used was "nlminb"
##
## R thinks it has found the ML solution.
```

Cool. OK, now we can compare this to our new \(\Gamma\) model. Remember, our generating process **is not** \(\Gamma\)-distributed rates across edges! We’re just curious to see if the *real* process (discrete regime shifts) is captured *at all* by `fitgammaMk`

.

```
fit_gamma<-fitgammaMk(tree,x,model=MODEL,nrates=8,
pi="fitzjohn",marginal=TRUE,rand_start=TRUE)
```

```
## --
## Computing marginal scaled likelihoods (in parallel) of each
## rate on each edge. Caution: this is NOT fast....
## --
```

```
fit_gamma
```

```
## Object of class "fitgammaMk".
##
## Fitted (or set) value of Q:
## a b c
## a -11.4076 11.4076 0.0000
## b 11.4076 -22.8152 11.4076
## c 0.0000 11.4076 -11.4076
##
## Fitted (or set) value of alpha rate heterogeneity
## (with 8 rate categories): 0.102942
##
## Fitted (or set) value of pi:
## a b c
## 0.303173 0.463638 0.233189
## due to treating the root prior as (a) nuisance.
##
## Log-likelihood: -181.833211
##
## Optimization method used was "nlminb"
##
## R thinks it has found the ML solution.
```

Clearly, this model *fit* much better than a homogenous rate M*k* model, but let’s see if the marginal edge rates reflect what we know to be the true rate heterogeneity of our data.

```
par(mfrow=c(1,2))
plot(sim_tree,cols,ftype="off",lwd=1)
plot(fit_gamma,ftype="off")
```

Awesome. That’s really not bad.

Of course, if we have a hypothesis of discrete regime shifts, but we don’t know where these might have occurred, a more appropriate model to fit would be the hidden-rates model of Beaulieu et al. (2013). In *phytools*, this can be doing using `fitHRM`

. Unfortunately, `fitHRM`

doesn’t allow us to specify a custom (e.g., ordered) transition model. *Fortunately*, it’s pretty easy to pull the design matrix out of a fitted model object & then update it to match our hypothesized **Q** matrix structure. Let’s do it.

```
## fit equal-rates HRM model (just to get object)
fit_hrm<-fitHRM(tree,x,model="ER",ncat=2,pi="fitzjohn",
corHMM_model=TRUE,niter=1)
```

```
##
## This is the design matrix of the fitted model.
## Does it make sense?
##
## a a* b b* c c*
## a 0 3 1 0 1 0
## a* 3 0 0 2 0 2
## b 1 0 0 3 1 0
## b* 0 2 3 0 0 2
## c 1 0 1 0 0 3
## c* 0 2 0 2 3 0
##
## log-likelihood from current iteration: -160.9726
## --- Best log-likelihood so far: -160.9726 ---
```

```
## pull out model structure
hrm_MODEL<-fit_hrm$index.matrix
hrm_MODEL
```

```
## a a* b b* c c*
## a 0 3 1 0 1 0
## a* 3 0 0 2 0 2
## b 1 0 0 3 1 0
## b* 0 2 3 0 0 2
## c 1 0 1 0 0 3
## c* 0 2 0 2 3 0
```

```
## update model structure to be ordered a<->b<->c
## for each rate level
hrm_MODEL[1,5]<-hrm_MODEL[2,6]<-hrm_MODEL[5,1]<-
hrm_MODEL[6,2]<-0
hrm_MODEL
```

```
## a a* b b* c c*
## a 0 3 1 0 0 0
## a* 3 0 0 2 0 0
## b 1 0 0 3 1 0
## b* 0 2 3 0 0 2
## c 0 0 1 0 0 3
## c* 0 0 0 2 3 0
```

```
## fit hidden-rates model using fitMk
fit_hrm<-fitMk(tree,fit_hrm$data,model=hrm_MODEL,pi="fitzjohn",
rand_start=TRUE)
fit_hrm
```

```
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## a a* b b* c c*
## a -24.13886 0.109710 24.02915 0.000000 0.00000 0.000000
## a* 0.10971 -0.437854 0.00000 0.328144 0.00000 0.000000
## b 24.02915 0.000000 -48.16801 0.109710 24.02915 0.000000
## b* 0.00000 0.328144 0.10971 -0.765997 0.00000 0.328144
## c 0.00000 0.000000 24.02915 0.000000 -24.13886 0.109710
## c* 0.00000 0.000000 0.00000 0.328144 0.10971 -0.437854
##
## Fitted (or set) value of pi:
## a a* b b* c c*
## 0.012770 0.161929 0.010274 0.735779 0.008636 0.070611
## due to treating the root prior as (a) nuisance.
##
## Log-likelihood: -156.858431
##
## Optimization method used was "nlminb"
##
## R thinks it has found the ML solution.
```

OK, obviously this model fits **way** better – just as one might expect it would. But how do we map our rate regimes back onto the edges of the tree? The *easiest* way to do this (IMO) is to perform stochastic mapping on the fitted model, merge by rate regime level, and then visual our merged maps using `phytools::densityMap`

.

Let’s try it.

```
## perform stochastic mapping
hrm_smap<-simmap(fit_hrm)
hrm_smap
```

```
## 100 phylogenetic trees with mapped discrete characters
```

```
hrm_smap<-mergeMappedStates(hrm_smap,c("a*","b*","c*"),
"cold")
hrm_smap<-mergeMappedStates(hrm_smap,c("a","b","c"),
"warm")
```

```
## computing "densityMap" object
dmap<-densityMap(hrm_smap,plot=FALSE)
```

```
## sorry - this might take a while; please be patient
```

```
## finally, let's create our plot
par(mfrow=c(1,2))
plot(sim_tree,cols,ftype="off",lwd=1)
Nt<-Ntip(tree)
h<-max(nodeHeights(tree))
plot(dmap,ftype="off",lwd=2,legend=FALSE,
xlim=c(-0.3,1))
h <- max(nodeHeights(tree))
LWD <- diff(par()$usr[1:2])/dev.size("px")[1]
Nt <- Ntip(tree)
lines(x = rep(-0.25 * h + LWD * 15/2, 2), y = c(1 + 1/40 *
Nt, Nt - 1/40 * Nt))
nticks <- 11
Y <- cbind(seq(1 + 1/40 * Nt, Nt - 1/40 * Nt, length.out = nticks),
seq(1 + 1/40 * Nt, Nt - 1/40 * Nt, length.out = nticks))
X <- cbind(rep(-0.25 * h + LWD * 15/2, nticks), rep(-0.25 *
h + LWD * 15/2 + 0.02 * h, nticks))
for (i in 1:nrow(Y)) lines(X[i, ], Y[i, ])
add.color.bar(Nt-2/40*Nt-1,dmap$cols,
x=-0.25*h,y=1+1/40*Nt,
title="probability of being in \"warm\" state",
lims=NULL,digits=3,direction="upwards",
subtitle="",lwd=15,prompt=FALSE)
ticks<-seq(0,1.0,length.out=11)
text(x = X[, 2], y = Y[, 2], signif(ticks, 2), pos = 4,
cex = 0.7)
```

Wow. The distribution of “warm” rates match our simulation conditions with near exact precision. Cool!

That’s all for now.

## No comments:

## Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.