I just updated the way `fitHRM`

(for fitting the hidden-rates model of Beaulieu et al. 2013) selects starting values for optimization. This was necessary because it seems like the previous starting values could sometimes be not very good.

I wanted to quickly illustrate how `fitHRM`

can be used to fit a simple extended M*k* model (that is, a Markov model *without* hidden states) using multiple optimization iterations & parallelization.

To demonstrate this, I’ll use a hard optimization problem: the ARD (all-rates-different) model for digit number evolution in squamate reptiles.

The ARD model is not a particularly good fit to these data – we just happen to know (from prior experience) that the optimization problem is difficult, so it should be a good test of our method!

The data for this example are modified from Brandley et al. (2008) – we also use them in our book. To follow along, users will have to update *phytools* from GitHub.

```
library(phytools)
packageVersion("phytools")
```

```
## [1] '1.7.2'
```

We can start off by loading our trait data & tree. I’ll do this by reading in our files directly from the book website as follows.

```
library(geiger)
squamate.tree<-read.nexus(file="http://www.phytools.org/Rbook/6/squamate.tre")
squamate.data<-read.csv(file="http://www.phytools.org/Rbook/6/squamate-data.csv",
row.names=1)
toe_number<-as.factor(setNames(squamate.data[[1]],rownames(squamate.data)))
chk<-name.check(squamate.tree,squamate.data)
summary(chk)
```

```
## 139 taxa are present in the tree but not the data:
## Abronia_graminea,
## Acontias_litoralis,
## Acontophiops_lineatus,
## Acrochordus_granulatus,
## Agamodon_anguliceps,
## Agkistrodon_contortrix,
## ....
## 1 taxon is present in the data but not the tree:
## Trachyboa_boulengeri
##
## To see complete list of mis-matched taxa, print object.
```

```
squamate.tree<-drop.tip(squamate.tree,chk$tree_not_data)
toe_number<-toe_number[squamate.tree$tip.label]
```

Next, I’m going to set the number of computer cores that I want to use for optimization. For this task I’ll choose the *minimum* of 15 and the number of computer cores I have in my machine (in my case, that’s 16). `parallel::detectCores`

can be used to identify the number of available cores.

```
ncores<-min(c(15,parallel::detectCores()))
ncores
```

```
## [1] 15
```

Now I’m going to proceed and do two M*k* model optimizations using parallelization. First, I’ll parallelize my optimization iterations. This is done internally in `fitHRM`

using `foreach`

(as described here). In this case, it does not make any sense to assign more cores to the process than optimization iterations because each optimization will be run in serial on a separate instance of R. (If `ncores > niter`

than R just creates instances of R that consume memory, but don’t do anything!)

```
ard_fit1<-fitHRM(squamate.tree,toe_number,model="ARD",pi="fitzjohn",
ncat=1,parallel=TRUE,lik.func="pruning",niter=15,ncores=ncores)
```

```
##
## This is the design matrix of the fitted model.
## Does it make sense?
##
## 0 1 2 3 4 5
## 0 0 1 2 3 4 5
## 1 6 0 7 8 9 10
## 2 11 12 0 13 14 15
## 3 16 17 18 0 19 20
## 4 21 22 23 24 0 25
## 5 26 27 28 29 30 0
##
## Opened cluster with 15 cores.
## Running optimization iterations in parallel.
## Please wait....
```

```
ard_fit1
```

```
## Object of class "fitHRM".
##
## Observed states: [ 0, 1, 2, 3, 4, 5 ]
## Number of rate categories per state: [ 1, 1, 1, 1, 1, 1 ]
##
## Fitted (or set) value of Q:
## 0 1 2 3 4 5
## 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000
## 1 0.013157 -0.013157 0.000000 0.000000 0.000000 0.00000
## 2 0.012577 0.000000 -0.023517 0.010939 0.000000 0.00000
## 3 0.017438 0.000000 0.023222 -0.040660 0.000000 0.00000
## 4 0.000000 0.003780 0.000000 0.012048 -0.049627 0.03380
## 5 0.000000 0.001648 0.000237 0.000000 0.002795 -0.00468
##
## Fitted (or set) value of pi:
## 0 1 2 3 4 5
## 0.000000 0.000000 0.000000 0.000000 0.328325 0.671675
## due to treating the root prior as (a) nuisance.
##
## Log-likelihood: -109.417003
##
## Optimization method used was "nlminb"
```

The key argument that tells `fitHRM`

we’re running a standard M*k* rather than a hidden-rates-model is `ncat=1`

(the number of rate categories is 1).

Aside from assigning optimization iterations to different cores, there is another way that optimization can be parallelized in R, and that is via the *optimParallel* package. Let’s try it. In this case, we *should not* set `parallel=TRUE`

, but (rather) adjust `opt.method="optimParallel"`

. We’ll keep our `ncores`

the same – but, remember, in this case we’re going to be running 15 optimization iterations in series, but in which each one uses parallelized optimization.

```
ard_fit2<-fitHRM(squamate.tree,toe_number,model="ARD",pi="fitzjohn",
ncat=1,opt.method="optimParallel",lik.func="pruning",niter=15,
ncores=ncores)
```

```
##
## This is the design matrix of the fitted model.
## Does it make sense?
##
## 0 1 2 3 4 5
## 0 0 1 2 3 4 5
## 1 6 0 7 8 9 10
## 2 11 12 0 13 14 15
## 3 16 17 18 0 19 20
## 4 21 22 23 24 0 25
## 5 26 27 28 29 30 0
```

```
## Warning in log(pp): NaNs produced
```

```
## log-likelihood from current iteration: -109.458
## --- Best log-likelihood so far: -109.458 ---
```

```
## Warning in log(pp): NaNs produced
```

```
## log-likelihood from current iteration: -109.533
## --- Best log-likelihood so far: -109.458 ---
## log-likelihood from current iteration: -111.716
## --- Best log-likelihood so far: -109.458 ---
```

```
## Warning in log(pp): NaNs produced
```

```
## log-likelihood from current iteration: -109.4958
## --- Best log-likelihood so far: -109.458 ---
```

```
## Warning in log(pp): NaNs produced
```

```
## log-likelihood from current iteration: -109.8417
## --- Best log-likelihood so far: -109.458 ---
## log-likelihood from current iteration: -109.4213
## --- Best log-likelihood so far: -109.4213 ---
```

```
## Warning in log(pp): NaNs produced
```

```
## log-likelihood from current iteration: -109.4172
## --- Best log-likelihood so far: -109.4172 ---
```

```
## Warning in log(pp): NaNs produced
```

```
## log-likelihood from current iteration: -109.4281
## --- Best log-likelihood so far: -109.4172 ---
## log-likelihood from current iteration: -109.6712
## --- Best log-likelihood so far: -109.4172 ---
## log-likelihood from current iteration: -110.1308
## --- Best log-likelihood so far: -109.4172 ---
```

```
## Warning in log(pp): NaNs produced
```

```
## log-likelihood from current iteration: -110.7199
## --- Best log-likelihood so far: -109.4172 ---
## log-likelihood from current iteration: -109.5806
## --- Best log-likelihood so far: -109.4172 ---
## log-likelihood from current iteration: -109.5754
## --- Best log-likelihood so far: -109.4172 ---
## log-likelihood from current iteration: -109.7035
## --- Best log-likelihood so far: -109.4172 ---
```

```
## Warning in log(pp): NaNs produced
```

```
## log-likelihood from current iteration: -109.7828
## --- Best log-likelihood so far: -109.4172 ---
```

```
ard_fit2
```

```
## Object of class "fitHRM".
##
## Observed states: [ 0, 1, 2, 3, 4, 5 ]
## Number of rate categories per state: [ 1, 1, 1, 1, 1, 1 ]
##
## Fitted (or set) value of Q:
## 0 1 2 3 4 5
## 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
## 1 0.013160 -0.013161 0.000001 0.000000 0.000000 0.000000
## 2 0.012542 0.000001 -0.023679 0.011136 0.000000 0.000000
## 3 0.017505 0.000000 0.023397 -0.040902 0.000000 0.000000
## 4 0.000000 0.003796 0.000000 0.012059 -0.049648 0.033793
## 5 0.000000 0.001646 0.000235 0.000000 0.002799 -0.004681
##
## Fitted (or set) value of pi:
## 0 1 2 3 4 5
## 0.000000 0.000000 0.000000 0.000000 0.328045 0.671955
## due to treating the root prior as (a) nuisance.
##
## Log-likelihood: -109.417207
##
## Optimization method used was "optimParallel"
```

Compare these two results to what we reported in the book.

For multiple optimization iterations, `fitHRM`

also passes back the results of all 15 (in our case) optimization iterations back to the user in the form of the object element `all.fits`

. Let’s pull off the log-likelihoods to see how they vary from iteration to iteration depending on our method.

```
ard_logL1<-sapply(ard_fit1$all.fits,logLik)
ard_logL1
```

```
## [1] -109.6376 -109.4170 -117.4159 -109.4170 -109.9385 -109.5287 -109.5431 -109.5287 -109.8213 -109.4969
## [11] -109.7821 -133.1351 -109.4316 -109.7633 -111.7605
```

```
ard_logL2<-sapply(ard_fit2$all.fits,logLik)
ard_logL2
```

```
## [1] -109.4580 -109.5330 -111.7160 -109.4958 -109.8417 -109.4213 -109.4172 -109.4281 -109.6712 -110.1308
## [11] -110.7199 -109.5806 -109.5754 -109.7035 -109.7828
```

Lastly, let’s graph our fitted model. Here users might note a small update to the behavior of the plot method in that if we expand `xlim`

the vertical legend pulls farther away from the plotted model. I added this feature because it was not uncommon to find this legend crowding our plot, depending on the scale of our plotting device.

```
par(mfrow=c(1,2))
plot(ard_fit1,width=TRUE,text=FALSE,color=TRUE,
show.zeros=FALSE,max.lwd=6,xlim=c(-1.8,1.2),
mar=c(0.1,1.1,2.1,0.1))
mtext(paste("a) parallel=TRUE log(L) =",round(logLik(ard_fit1),3)),
line=-1,adj=0.1)
plot(ard_fit2,width=TRUE,text=FALSE,color=TRUE,
show.zeros=FALSE,max.lwd=6,xlim=c(-1.8,1.2),
mar=c(0.1,1.1,2.1,0.1))
mtext(paste("b) opt.method=\"optimParallel\" log(L) =",
round(logLik(ard_fit2),3)),line=-1,adj=0.1)
```

Cool. That’s it!

## No comments:

## Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.