I just updated the way fitHRM
(for fitting the hidden-rates model of Beaulieu et al. 2013) selects starting values for optimization. This was necessary because it seems like the previous starting values could sometimes be not very good.
I wanted to quickly illustrate how fitHRM
can be used to fit a simple extended Mk model (that is, a Markov model without hidden states) using multiple optimization iterations & parallelization.
To demonstrate this, I’ll use a hard optimization problem: the ARD (all-rates-different) model for digit number evolution in squamate reptiles.
The ARD model is not a particularly good fit to these data – we just happen to know (from prior experience) that the optimization problem is difficult, so it should be a good test of our method!
The data for this example are modified from Brandley et al. (2008) – we also use them in our book. To follow along, users will have to update phytools from GitHub.
library(phytools)
packageVersion("phytools")
## [1] '1.7.2'
We can start off by loading our trait data & tree. I’ll do this by reading in our files directly from the book website as follows.
library(geiger)
squamate.tree<-read.nexus(file="http://www.phytools.org/Rbook/6/squamate.tre")
squamate.data<-read.csv(file="http://www.phytools.org/Rbook/6/squamate-data.csv",
row.names=1)
toe_number<-as.factor(setNames(squamate.data[[1]],rownames(squamate.data)))
chk<-name.check(squamate.tree,squamate.data)
summary(chk)
## 139 taxa are present in the tree but not the data:
## Abronia_graminea,
## Acontias_litoralis,
## Acontophiops_lineatus,
## Acrochordus_granulatus,
## Agamodon_anguliceps,
## Agkistrodon_contortrix,
## ....
## 1 taxon is present in the data but not the tree:
## Trachyboa_boulengeri
##
## To see complete list of mis-matched taxa, print object.
squamate.tree<-drop.tip(squamate.tree,chk$tree_not_data)
toe_number<-toe_number[squamate.tree$tip.label]
Next, I’m going to set the number of computer cores that I want to use for optimization. For this task I’ll choose the minimum of 15 and the number of computer cores I have in my machine (in my case, that’s 16). parallel::detectCores
can be used to identify the number of available cores.
ncores<-min(c(15,parallel::detectCores()))
ncores
## [1] 15
Now I’m going to proceed and do two Mk model optimizations using parallelization. First, I’ll parallelize my optimization iterations. This is done internally in fitHRM
using foreach
(as described here). In this case, it does not make any sense to assign more cores to the process than optimization iterations because each optimization will be run in serial on a separate instance of R. (If ncores > niter
than R just creates instances of R that consume memory, but don’t do anything!)
ard_fit1<-fitHRM(squamate.tree,toe_number,model="ARD",pi="fitzjohn",
ncat=1,parallel=TRUE,lik.func="pruning",niter=15,ncores=ncores)
##
## This is the design matrix of the fitted model.
## Does it make sense?
##
## 0 1 2 3 4 5
## 0 0 1 2 3 4 5
## 1 6 0 7 8 9 10
## 2 11 12 0 13 14 15
## 3 16 17 18 0 19 20
## 4 21 22 23 24 0 25
## 5 26 27 28 29 30 0
##
## Opened cluster with 15 cores.
## Running optimization iterations in parallel.
## Please wait....
ard_fit1
## Object of class "fitHRM".
##
## Observed states: [ 0, 1, 2, 3, 4, 5 ]
## Number of rate categories per state: [ 1, 1, 1, 1, 1, 1 ]
##
## Fitted (or set) value of Q:
## 0 1 2 3 4 5
## 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000
## 1 0.013157 -0.013157 0.000000 0.000000 0.000000 0.00000
## 2 0.012577 0.000000 -0.023517 0.010939 0.000000 0.00000
## 3 0.017438 0.000000 0.023222 -0.040660 0.000000 0.00000
## 4 0.000000 0.003780 0.000000 0.012048 -0.049627 0.03380
## 5 0.000000 0.001648 0.000237 0.000000 0.002795 -0.00468
##
## Fitted (or set) value of pi:
## 0 1 2 3 4 5
## 0.000000 0.000000 0.000000 0.000000 0.328325 0.671675
## due to treating the root prior as (a) nuisance.
##
## Log-likelihood: -109.417003
##
## Optimization method used was "nlminb"
The key argument that tells fitHRM
we’re running a standard Mk rather than a hidden-rates-model is ncat=1
(the number of rate categories is 1).
Aside from assigning optimization iterations to different cores, there is another way that optimization can be parallelized in R, and that is via the optimParallel package. Let’s try it. In this case, we should not set parallel=TRUE
, but (rather) adjust opt.method="optimParallel"
. We’ll keep our ncores
the same – but, remember, in this case we’re going to be running 15 optimization iterations in series, but in which each one uses parallelized optimization.
ard_fit2<-fitHRM(squamate.tree,toe_number,model="ARD",pi="fitzjohn",
ncat=1,opt.method="optimParallel",lik.func="pruning",niter=15,
ncores=ncores)
##
## This is the design matrix of the fitted model.
## Does it make sense?
##
## 0 1 2 3 4 5
## 0 0 1 2 3 4 5
## 1 6 0 7 8 9 10
## 2 11 12 0 13 14 15
## 3 16 17 18 0 19 20
## 4 21 22 23 24 0 25
## 5 26 27 28 29 30 0
## Warning in log(pp): NaNs produced
## log-likelihood from current iteration: -109.458
## --- Best log-likelihood so far: -109.458 ---
## Warning in log(pp): NaNs produced
## log-likelihood from current iteration: -109.533
## --- Best log-likelihood so far: -109.458 ---
## log-likelihood from current iteration: -111.716
## --- Best log-likelihood so far: -109.458 ---
## Warning in log(pp): NaNs produced
## log-likelihood from current iteration: -109.4958
## --- Best log-likelihood so far: -109.458 ---
## Warning in log(pp): NaNs produced
## log-likelihood from current iteration: -109.8417
## --- Best log-likelihood so far: -109.458 ---
## log-likelihood from current iteration: -109.4213
## --- Best log-likelihood so far: -109.4213 ---
## Warning in log(pp): NaNs produced
## log-likelihood from current iteration: -109.4172
## --- Best log-likelihood so far: -109.4172 ---
## Warning in log(pp): NaNs produced
## log-likelihood from current iteration: -109.4281
## --- Best log-likelihood so far: -109.4172 ---
## log-likelihood from current iteration: -109.6712
## --- Best log-likelihood so far: -109.4172 ---
## log-likelihood from current iteration: -110.1308
## --- Best log-likelihood so far: -109.4172 ---
## Warning in log(pp): NaNs produced
## log-likelihood from current iteration: -110.7199
## --- Best log-likelihood so far: -109.4172 ---
## log-likelihood from current iteration: -109.5806
## --- Best log-likelihood so far: -109.4172 ---
## log-likelihood from current iteration: -109.5754
## --- Best log-likelihood so far: -109.4172 ---
## log-likelihood from current iteration: -109.7035
## --- Best log-likelihood so far: -109.4172 ---
## Warning in log(pp): NaNs produced
## log-likelihood from current iteration: -109.7828
## --- Best log-likelihood so far: -109.4172 ---
ard_fit2
## Object of class "fitHRM".
##
## Observed states: [ 0, 1, 2, 3, 4, 5 ]
## Number of rate categories per state: [ 1, 1, 1, 1, 1, 1 ]
##
## Fitted (or set) value of Q:
## 0 1 2 3 4 5
## 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
## 1 0.013160 -0.013161 0.000001 0.000000 0.000000 0.000000
## 2 0.012542 0.000001 -0.023679 0.011136 0.000000 0.000000
## 3 0.017505 0.000000 0.023397 -0.040902 0.000000 0.000000
## 4 0.000000 0.003796 0.000000 0.012059 -0.049648 0.033793
## 5 0.000000 0.001646 0.000235 0.000000 0.002799 -0.004681
##
## Fitted (or set) value of pi:
## 0 1 2 3 4 5
## 0.000000 0.000000 0.000000 0.000000 0.328045 0.671955
## due to treating the root prior as (a) nuisance.
##
## Log-likelihood: -109.417207
##
## Optimization method used was "optimParallel"
Compare these two results to what we reported in the book.
For multiple optimization iterations, fitHRM
also passes back the results of all 15 (in our case) optimization iterations back to the user in the form of the object element all.fits
. Let’s pull off the log-likelihoods to see how they vary from iteration to iteration depending on our method.
ard_logL1<-sapply(ard_fit1$all.fits,logLik)
ard_logL1
## [1] -109.6376 -109.4170 -117.4159 -109.4170 -109.9385 -109.5287 -109.5431 -109.5287 -109.8213 -109.4969
## [11] -109.7821 -133.1351 -109.4316 -109.7633 -111.7605
ard_logL2<-sapply(ard_fit2$all.fits,logLik)
ard_logL2
## [1] -109.4580 -109.5330 -111.7160 -109.4958 -109.8417 -109.4213 -109.4172 -109.4281 -109.6712 -110.1308
## [11] -110.7199 -109.5806 -109.5754 -109.7035 -109.7828
Lastly, let’s graph our fitted model. Here users might note a small update to the behavior of the plot method in that if we expand xlim
the vertical legend pulls farther away from the plotted model. I added this feature because it was not uncommon to find this legend crowding our plot, depending on the scale of our plotting device.
par(mfrow=c(1,2))
plot(ard_fit1,width=TRUE,text=FALSE,color=TRUE,
show.zeros=FALSE,max.lwd=6,xlim=c(-1.8,1.2),
mar=c(0.1,1.1,2.1,0.1))
mtext(paste("a) parallel=TRUE log(L) =",round(logLik(ard_fit1),3)),
line=-1,adj=0.1)
plot(ard_fit2,width=TRUE,text=FALSE,color=TRUE,
show.zeros=FALSE,max.lwd=6,xlim=c(-1.8,1.2),
mar=c(0.1,1.1,2.1,0.1))
mtext(paste("b) opt.method=\"optimParallel\" log(L) =",
round(logLik(ard_fit2),3)),line=-1,adj=0.1)
Cool. That’s it!
No comments:
Post a Comment
Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.