Last week I wrote various posts (1, 2, 3, 4, 5) about fitting and analyzing a model of discrete character evolution with \(\Gamma\) distributed rate heterogeneity among edges.

I had to set it aside for other things, but after pushing a few updates & improvements to *GitHub* this morning, I thought I’d circle back & illustrate a full empirical example using the evolution of digit number in squamates: the same example that I considered in Chapter 6 of my book last year with Luke Harmon.

To start with, I’m going to load *phytools*.

Remember, to follow along at this point you’ll need to make sure that you update *phytools* from *GitHub* which can be done using (e.g.) the package *devtools* (for example, by running `devtools::install_github("liamrevell/phytools")`

in an interactive R session with *devtools* installed.)

```
library(phytools)
```

```
## Loading required package: ape
```

```
## Loading required package: maps
```

```
packageVersion("phytools")
```

```
## [1] '2.0.13'
```

Having loaded *phytools*, let’s get our data & tree.

These can be obtained from the book website and are modified from results that were originally published by Brandley et al. (2008).

```
squamate.tree<-read.nexus(
file="http://www.phytools.org/Rbook/6/squamate.tre")
squamate.data<-read.csv(
file="http://www.phytools.org/Rbook/6/squamate-data.csv",
row.names=1)
```

As we showed in the book, there are some mismatches between the tree & data – so we’ll use the `geiger::name.check`

function to reconcile the two.

```
library(geiger)
chk<-name.check(squamate.tree,squamate.data)
summary(chk)
```

```
## 139 taxa are present in the tree but not the data:
## Abronia_graminea,
## Acontias_litoralis,
## Acontophiops_lineatus,
## Acrochordus_granulatus,
## Agamodon_anguliceps,
## Agkistrodon_contortrix,
## ....
## 1 taxon is present in the data but not the tree:
## Trachyboa_boulengeri
##
## To see complete list of mis-matched taxa, print object.
```

```
squamate.tree<-drop.tip(squamate.tree,
chk$tree_not_data)
toe_number<-as.factor(setNames(squamate.data[[1]],
rownames(squamate.data)))
toe_number<-toe_number[squamate.tree$tip.label]
```

Next, I’ll load some other packages that I intend to use to parallelize multiple optimization iterations.

Fortunately, if you have a current version of *phytools*, you’ll already have all these other packages installed.

```
library(foreach)
library(parallel)
library(doParallel)
```

```
## Loading required package: iterators
```

Now, in the book we showed that out of a standard set of homogeneous rate M*k* models (`"ER"`

, `"SYM"`

, `"ARD"`

, a loss-only directional model, & a loss-and-gain ordered model) the directional and ordered models were *overwhelmingly* better-supported than the other three. As such, let’s take these two models as our starting point then go ahead & fit them.

Because of the difficulty of optimizing some (though not all) M*k* models in R, I’m going to use the *foreach* package to run multiple, parallelized optimization iterations from different random starting values for each model.

```
## number of optimization iterations
niter<-20
```

To use *foreach*, we must first create a parallel socket cluster.

These are multiple copies of R that are all open simultaneously, each running a separate one of optimization iterations.

(In fact, you can see this if you open *Task Manager* in Windows while your cluster is running.)

When setting up our cluster, we can use `parallel::detectCores`

to figure out how many cores our computer has. As a general rule, we shouldn’t use *all* of them (the code below will leave at least two free), but there’s also no point in opening more parallel R sessions than we have optimization iterations to run!

```
## set number of cores to use
ncores<-min(c(parallel::detectCores()-2,
niter))
## create parallel socket cluster
mc<-makeCluster(ncores,type="PSOCK")
registerDoParallel(cl=mc)
```

Now let’s fit our directional (digit loss only) M*k* model with `niter=20`

optimization iterations.

To do so, we should first create a design matrix for the model. This will tell our optimizer (in this case, `phytools::fitMk`

) which rates to estimate, and which to disallow or set to zero.

```
## design matrix
Directional<-matrix(c(
0,0,0,0,0,0,
1,0,0,0,0,0,
0,2,0,0,0,0,
0,0,3,0,0,0,
0,0,0,4,0,0,
0,0,0,0,5,0),6,6,byrow=TRUE,
dimnames=list(levels(toe_number),
levels(toe_number)))
Directional
```

```
## 0 1 2 3 4 5
## 0 0 0 0 0 0 0
## 1 1 0 0 0 0 0
## 2 0 2 0 0 0 0
## 3 0 0 3 0 0 0
## 4 0 0 0 4 0 0
## 5 0 0 0 0 5 0
```

We’re at last ready to start our parallelized optimization. Using `foreach::foreach`

, that looks as follows.

```
fits<-foreach(i=1:niter)%dopar%{
phytools::fitMk(squamate.tree,toe_number,
model=Directional,pi="fitzjohn",rand_start=TRUE)
}
```

I’ll now *extract* and print out the log-likelihoods of each optimization.

```
logL<-sapply(fits,logLik)
logL
```

```
## [1] -118.3870 -118.6303 -118.5285 -118.3870 -118.3870 -118.3870 -118.3870 -118.3870 -140.3878
## [10] -118.3870 -118.8699 -118.9325 -121.0795 -118.3870 -162.1625 -118.3870 -118.5332 -118.3870
## [19] -118.3870 -118.3870
```

Most likely, a number of these converged to the same value. *Hopefully* this is our ML solution for this model.

(As a sidenote, because there’s some computational cost in opening our cluster and communicating between the nodes, running 20 optimizations in parallel is slower than running just one in serial – though much faster than running 20 in serial, of course – so we may only want to do this for models that are difficult to optimize.)

Let’s find the best of this set.

```
directional_mk<-fits[[which(logL==max(logL))[1]]]
print(directional_mk,digits=4)
```

```
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## 0 1 2 3 4 5
## 0 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## 1 0.0242 -0.0242 0.0000 0.0000 0.0000 0.0000
## 2 0.0000 0.1031 -0.1031 0.0000 0.0000 0.0000
## 3 0.0000 0.0000 0.1092 -0.1092 0.0000 0.0000
## 4 0.0000 0.0000 0.0000 0.0833 -0.0833 0.0000
## 5 0.0000 0.0000 0.0000 0.0000 0.0044 -0.0044
##
## Fitted (or set) value of pi:
## 0 1 2 3 4 5
## 0 0 0 0 0 1
## due to treating the root prior as (a) nuisance.
##
## Log-likelihood: -118.387
##
## Optimization method used was "nlminb"
##
## R thinks it has found the ML solution.
```

Now we can proceed and do exactly the same thing, but for our ordered, non-directional (gain-and-loss) model.

This time we don’t need to open a new cluster – we already have the one we created earlier! (We should, however, remember to close it when we no longer need it or else all those parallel R sessions will just stay open, running in the background & using memory, until we shut down R.)

```
## design matrix
Ordered<-matrix(c(
0,6,0,0,0,0,
1,0,7,0,0,0,
0,2,0,8,0,0,
0,0,3,0,9,0,
0,0,0,4,0,10,
0,0,0,0,5,0),6,6,byrow=TRUE,
dimnames=list(levels(toe_number),
levels(toe_number)))
Ordered
```

```
## 0 1 2 3 4 5
## 0 0 6 0 0 0 0
## 1 1 0 7 0 0 0
## 2 0 2 0 8 0 0
## 3 0 0 3 0 9 0
## 4 0 0 0 4 0 10
## 5 0 0 0 0 5 0
```

```
fits<-foreach(i=1:niter)%dopar%{
phytools::fitMk(squamate.tree,toe_number,
model=Ordered,pi="fitzjohn",rand_start=TRUE)
}
logL<-sapply(fits,logLik)
logL
```

```
## [1] -116.7767 -116.7753 -119.9617 -123.3256 -118.4797 -121.3946 -116.7760 -114.9687 -116.7753
## [10] -123.6688 -130.4098 -127.8928 -114.9672 -115.0745 -117.3701 -116.4219 -117.4540 -114.9676
## [19] -114.9672 -123.8975
```

```
ordered_mk<-fits[[which(logL==max(logL))[1]]]
print(ordered_mk,digits=4)
```

```
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## 0 1 2 3 4 5
## 0 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## 1 0.0223 -0.0223 0.0000 0.0000 0.0000 0.0000
## 2 0.0000 0.0661 -750.4793 750.4132 0.0000 0.0000
## 3 0.0000 0.0000 939.2954 -939.2954 0.0000 0.0000
## 4 0.0000 0.0000 0.0000 0.0264 -0.0663 0.0399
## 5 0.0000 0.0000 0.0000 0.0000 0.0058 -0.0058
##
## Fitted (or set) value of pi:
## 0 1 2 3 4 5
## 0.0000 0.0000 0.0000 0.0000 0.2752 0.7248
## due to treating the root prior as (a) nuisance.
##
## Log-likelihood: -114.9672
##
## Optimization method used was "nlminb"
##
## R thinks it has found the ML solution.
```

Let’s plot & compare our two models so far.

```
par(mfrow=c(1,2))
plot(directional_mk,color=TRUE,width=TRUE,
offset=0.04,show.zeros=FALSE,xlim=c(-1.5,1),
mar=c(0.1,0.1,1.1,0.1))
mtext(paste("a) directional model; log(L) = ",
round(logLik(directional_mk),2)),line=0,adj=0)
plot(ordered_mk,color=TRUE,width=TRUE,
offset=0.04,show.zeros=FALSE,xlim=c(-1.5,1),
mar=c(0.1,0.1,1.1,0.1))
mtext(paste("b) ordered model; log(L) = ",
round(logLik(ordered_mk),2)),line=0,adj=0)
```

```
anova(directional_mk,ordered_mk)
```

```
## log(L) d.f. AIC weight
## directional_mk -118.3870 5 246.7741 0.8292236
## ordered_mk -114.9672 10 249.9344 0.1707764
```

This tells us that, of these two models, the best-supported is our directional model: in which digits can be lost by squamates, but not regained.

We shouldn’t be too preoccupied with the very high transition rates between states `3`

and `2`

in our ordered model. This likely indicates that these two states are effectively random with respect to each other (though not overall!) on the phylogeny, which is consistent with a high rate of back-and-forth change between the states. How high is probably hard to say!

Having done that, we should proceed and allow the rate of character evolution to vary from branch to branch under our new model of \(\Gamma\) distributed rate heterogeneity among edges of the tree.

To do that, I’m going to repeat my parallelization code of earlier using `foreach`

, but this time substitute the new *phytools* function `fitgammaMk`

. Having explored these data a bit already, I’ll update the minimum value of the \(\alpha\) shape parameter of our \(\Gamma\) distribution to `min.alpha=0.01`

. Lastly, I’m going to set `nrates`

(the number of rate categories in our discretized \(\Gamma\)) to `nrates=6`

. (This is arbitrary, but it is something that we could choose based on model fit.)

```
fits<-foreach(i=1:niter)%dopar%{
phytools::fitgammaMk(squamate.tree,toe_number,
model=Directional,pi="fitzjohn",rand_start=TRUE,
nrates=6,min.alpha=0.01)
}
logL<-sapply(fits,logLik)
logL
```

```
## [1] -113.1218 -114.3462 -114.3462 -113.1218 -113.1218 -113.1218 -113.1218 -113.1218 -113.1218
## [10] -114.3462 -113.1218 -113.1218 -113.1218 -113.1218 -113.1218 -113.1218 -113.1218 -113.1218
## [19] -113.1218 -113.1218
```

```
directional_mk.gamma<-fits[[which(logL==max(logL))[1]]]
print(directional_mk.gamma,digits=4)
```

```
## Object of class "fitgammaMk".
##
## Fitted (or set) value of Q:
## 0 1 2 3 4 5
## 0 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## 1 0.0119 -0.0119 0.0000 0.0000 0.0000 0.0000
## 2 0.0000 0.0278 -0.0278 0.0000 0.0000 0.0000
## 3 0.0000 0.0000 0.0313 -0.0313 0.0000 0.0000
## 4 0.0000 0.0000 0.0000 0.0276 -0.0276 0.0000
## 5 0.0000 0.0000 0.0000 0.0000 0.0148 -0.0148
##
## Fitted (or set) value of alpha rate heterogeneity
## (with 6 rate categories): 0.0681
##
## Fitted (or set) value of pi:
## 0 1 2 3 4 5
## 0 0 0 0 0 1
## due to treating the root prior as (a) nuisance.
##
## Log-likelihood: -113.1218
##
## Optimization method used was "nlminb"
##
## R thinks it has found the ML solution.
```

This gives us our optimized *directional* (digit loss only) model, but in which the rate of evolution is permitted to vary from edge to edge according to a model of \(\Gamma\) distributed rate heterogeneity. Before we evaluate whether or not we think that this has substantially improved the fit of our model to these data, let’s do the same thing for our *ordered* model in which digits are both lost *and* gained, though in a sequential fashion.

```
fits<-foreach(i=1:niter)%dopar%{
phytools::fitgammaMk(squamate.tree,toe_number,
model=Ordered,pi="fitzjohn",rand_start=TRUE,
nrates=6,min.alpha=0.01)
}
logL<-sapply(fits,logLik)
logL
```

```
## [1] -120.9345 -113.1219 -111.2750 -111.2750 -113.1218 -114.3464 -113.1219 -113.1219 -113.1218
## [10] -111.2750 -113.0791 -111.2750 -111.2750 -113.0791 -113.1219 -113.0792 -113.0792 -113.1218
## [19] -113.1220 -113.0791
```

```
ordered_mk.gamma<-fits[[which(logL==max(logL))[1]]]
print(ordered_mk.gamma,digits=4)
```

```
## Object of class "fitgammaMk".
##
## Fitted (or set) value of Q:
## 0 1 2 3 4 5
## 0 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## 1 0.0171 -0.0171 0.0000 0.0000 0.0000 0.0000
## 2 0.0000 0.0604 -0.0604 0.0000 0.0000 0.0000
## 3 0.0000 0.0000 0.0887 -0.0887 0.0000 0.0000
## 4 0.0000 0.0000 0.0000 0.0573 -0.2074 0.1500
## 5 0.0000 0.0000 0.0000 0.0000 0.0159 -0.0159
##
## Fitted (or set) value of alpha rate heterogeneity
## (with 6 rate categories): 0.2892
##
## Fitted (or set) value of pi:
## 0 1 2 3 4 5
## 0.0000 0.0000 0.0000 0.0000 0.2193 0.7807
## due to treating the root prior as (a) nuisance.
##
## Log-likelihood: -111.275
##
## Optimization method used was "nlminb"
##
## R thinks optimization may not have converged.
```

As before, we can plot these…. When we do, we should first convert each of our fitted models to a *phytools* object of class `"Qmatrix"`

(using `phytools::as.Qmatrix`

) – otherwise the `plot`

call will try to plot our marginal edge rates, which we have not (yet) computed!

```
par(mfrow=c(1,2))
plot(as.Qmatrix(directional_mk.gamma),color=TRUE,
width=TRUE,offset=0.04,show.zeros=FALSE,xlim=c(-1.5,1),
mar=c(0.1,0.1,1.1,0.1))
tmp<-bquote("a) directional + "~Gamma~
" model; log(L)"==.(round(logLik(directional_mk.gamma),2)))
mtext(tmp,line=0,adj=0)
plot(as.Qmatrix(ordered_mk.gamma),color=TRUE,width=TRUE,
offset=0.04,show.zeros=FALSE,xlim=c(-1.5,1),
mar=c(0.1,0.1,1.1,0.1))
tmp<-bquote("b) ordered + "~Gamma~
" model; log(L)"==.(round(logLik(ordered_mk.gamma),2)))
mtext(tmp,line=0,adj=0)
```

(As an aside, and note to my future self, the most complicated – and potentially useful – part of the code above is the use of `bquote`

to combine Greek letters and variable values into the same legend. Thanks to this blog post for the refresher!)

An interesting *observation* about the fitted **Q** matrices, above, is how *similar* they are, one to the other, now that we’re allowing the rate of evolution to vary from branch to branch.

OK. Let’s also kill our cluster now.

```
stopCluster(cl=mc)
```

We can go ahead and compare all *four* of our fitted models so far….

```
anova(directional_mk,
directional_mk.gamma,
ordered_mk,
ordered_mk.gamma)
```

```
## log(L) d.f. AIC weight
## directional_mk -118.3870 5 246.7741 0.013257936
## directional_mk.gamma -113.1218 6 238.2437 0.943698645
## ordered_mk -114.9672 10 249.9344 0.002730438
## ordered_mk.gamma -111.2750 11 244.5499 0.040312982
```

When we do, we see that the best-supported, by quite a wide margin, is the loss-only model with heterogeneous rates among edges. Indeed, this model has a better log-likelihood than the much more parameter-rich ordered M*k* model.

Finally, let’s compute the marginal scaled likelihoods of each rate on each edge. To do that I don’t need to re-optimize my model – I can simply feed `fitgammaMk`

our fitted **Q** matrix and \(\alpha\), and then set `marginal=TRUE`

as follows.

```
marginal.ordered_mk.gamma<-fitgammaMk(
squamate.tree,toe_number,
fixedQ=as.Qmatrix(directional_mk.gamma),
alpha.init=directional_mk.gamma$alpha,
marginal=TRUE)
```

```
## --
## Computing marginal scaled likelihoods (in parallel) of each
## rate on each edge. Caution: this is NOT fast....
## --
```

```
plot(marginal.ordered_mk.gamma,
colors=c("gray19","red","yellow","ivory2"),lwd=3)
```

That’s it. Cool.

## No comments:

## Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.