A couple of months ago I
posted
code for a paralellized version of `phytools::fitMk`

using
*optimParallel*. Unfortunately, in the
test example I
provided,
`fitMk.parallel`

ran just barely faster than `fitMk`

!

After playing around with parallel computing and `make.simmap`

yesterday, I
decided to revisit `fitMk.parallel`

, which is now in the
GitHub development version of *phytools*.

First, I tried reiterating the same analysis of my previous post.

```
## load packages
library(phytools)
library(geiger)
## check phytools verison number
packageVersion("phytools")
```

```
## [1] '1.1.3'
```

```
## read tree and data from file
lizard.tree<-read.nexus(file=
"http://www.phytools.org/Rbook/7/lizard_tree.nex")
lizard.tree
```

```
##
## Phylogenetic tree with 4162 tips and 4161 internal nodes.
##
## Tip labels:
## Sphenodon_punctatus, Dibamus_bourreti, Dibamus_greeri, Dibamus_montanus, Anelytropsis_papillosus, Dibamus_tiomanensis, ...
##
## Rooted; includes branch lengths.
```

```
## read data from file
lizard.data<-read.csv(file=
"http://www.phytools.org/Rbook/7/lizard_spines.csv",
stringsAsFactors=TRUE,row.names=1)
head(lizard.data)
```

```
## habitat tail.spines
## Ablepharus_budaki non-saxicolous non-spiny
## Abronia_anzuetoi non-saxicolous non-spiny
## Abronia_frosti non-saxicolous non-spiny
## Acanthodactylus_aureus non-saxicolous non-spiny
## Acanthodactylus_opheodurus non-saxicolous non-spiny
## Acanthodactylus_pardalis non-saxicolous non-spiny
```

```
## check names
chk<-name.check(lizard.tree,lizard.data)
summary(chk)
```

```
## 3504 taxa are present in the tree but not the data:
## Ablepharus_chernovi,
## Ablepharus_kitaibelii,
## Ablepharus_pannonicus,
## Abronia_aurita,
## Abronia_campbelli,
## Abronia_chiszari,
## ....
##
## To see complete list of mis-matched taxa, print object.
```

```
## prune tree to include only taxa in data
lizard.tree<-drop.tip(lizard.tree,chk$tree_not_data)
## extract discrete character
habitat<-setNames(lizard.data$habitat,
rownames(lizard.data))
head(habitat)
```

```
## Ablepharus_budaki Abronia_anzuetoi
## non-saxicolous non-saxicolous
## Abronia_frosti Acanthodactylus_aureus
## non-saxicolous non-saxicolous
## Acanthodactylus_opheodurus Acanthodactylus_pardalis
## non-saxicolous non-saxicolous
## Levels: non-saxicolous saxicolous
```

```
## run fitMk.parallel
system.time(
fit.optimParallel<-fitMk.parallel(lizard.tree,habitat,
model="ARD",pi="fitzjohn",ncores=8)
)
```

```
## user system elapsed
## 0.92 0.03 8.75
```

```
fit.optimParallel
```

```
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## non-saxicolous saxicolous
## non-saxicolous -0.001552 0.001552
## saxicolous 0.022388 -0.022388
##
## Fitted (or set) value of pi:
## non-saxicolous saxicolous
## 0.03763 0.96237
## due to treating the root prior as (a) nuisance.
##
## Log-likelihood: -216.724702
##
## Optimization method used was "optimParallel"
```

```
system.time(
fit.nlminb<-fitMk(lizard.tree,habitat,model="ARD",
pi="fitzjohn")
)
```

```
## user system elapsed
## 11.00 0.19 11.19
```

```
fit.nlminb
```

```
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## non-saxicolous saxicolous
## non-saxicolous -0.001552 0.001552
## saxicolous 0.022388 -0.022388
##
## Fitted (or set) value of pi:
## non-saxicolous saxicolous
## 0.03763 0.96237
## due to treating the root prior as (a) nuisance.
##
## Log-likelihood: -216.724702
##
## Optimization method used was "nlminb"
```

We see that although the same results are obtained, and `fitMk.parallel`

runs *slightly*
faster, there's barely any difference between the two functions!

After reading a bit more about the
*optimParallel* package, however, it occurred to me
that the gains from parallelization should vary as a function of the dimensionality (that is,
parameter complexity) of the optimization problem. This is because the number of gradient
calculations in serial `optim`

scales as a linear function of the number of parameters to be
estimated. The speed-up of `optimParallel`

is accomplished by distributing these calculations
across processors. If our model involves relatively few parameters, as in the preceding
example, we should probably not expect a large effect!

To investigate this hypothesis, I decided to simply simulate data for a multi-state, rather than
binary, trait on my original lizard tree, try to fit a model
to the simulated data using both `fitMk.parallel`

and `fit.Mk`

, then compare the results.

I'll use a random **Q** matrix for simulation, as follows.

```
Q<-matrix(rexp(n=4,rate=100),4,4,byrow=TRUE,
dimnames=list(letters[1:4],letters[1:4]))
Q[lower.tri(Q)]<-t(Q)[lower.tri(Q)]
diag(Q)<-0
diag(Q)<--rowSums(Q)
class(Q)<-"Qmatrix"
plot(Q)
title(main="Generating Q matrix",font.main=3)
```

```
x<-sim.Mk(lizard.tree,Q)
head(x)
```

```
## Dibamus_greeri Orraya_occultus Nephrurus_amyae Nephrurus_levis
## b d d d
## Lialis_jicari Aprasia_repens
## a a
## Levels: a b c d
```

Now I'm ready to run my analysis. I'll wrap each function call with `system.time`

so that
I can extract the timings of the optimization.

```
t1<-system.time(
fit.optimParallel<-fitMk.parallel(lizard.tree,x,
model="SYM",pi="fitzjohn",ncores=8)
)
t1
```

```
## user system elapsed
## 1.33 0.11 40.53
```

```
fit.optimParallel
```

```
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## a b c d
## a -0.006285 0.001570 0.000740 0.003975
## b 0.001570 -0.005601 0.000000 0.004032
## c 0.000740 0.000000 -0.005513 0.004773
## d 0.003975 0.004032 0.004773 -0.012780
##
## Fitted (or set) value of pi:
## a b c d
## 0.910297 0.033306 0.003123 0.053275
## due to treating the root prior as (a) nuisance.
##
## Log-likelihood: -438.818235
##
## Optimization method used was "optimParallel"
```

```
t2<-system.time(
fit.nlminb<-fitMk(lizard.tree,x,model="SYM",
pi="fitzjohn")
)
t2
```

```
## user system elapsed
## 82.25 1.75 84.14
```

```
fit.nlminb
```

```
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## a b c d
## a -0.006285 0.001570 0.000740 0.003975
## b 0.001570 -0.005601 0.000000 0.004032
## c 0.000740 0.000000 -0.005513 0.004773
## d 0.003975 0.004032 0.004773 -0.012781
##
## Fitted (or set) value of pi:
## a b c d
## 0.910298 0.033305 0.003122 0.053275
## due to treating the root prior as (a) nuisance.
##
## Log-likelihood: -438.818233
##
## Optimization method used was "nlminb"
```

Finally, let's graph the results.

```
par(mfrow=c(1,2))
plot(as.Qmatrix(fit.optimParallel),show.zeros=FALSE,tol=1e-4)
mtext(paste("a) fitMk.parallel: time = ",round(t1[3],1),"s; log(L) = ",
round(logLik(fit.optimParallel),2),sep=""),adj=0.05,
line=-1)
plot(as.Qmatrix(fit.nlminb),show.zeros=FALSE,tol=1e-4)
mtext(paste("b) fitMk:time = ",round(t2[3],1),"s; log(L) = ",
round(logLik(fit.nlminb),2),sep=""),adj=0.05,
line=-1)
```

A final note. Even though `fitMk.parallel`

seems much better than `fitMk`

here,
it has not been as thoroughly tested (and `fitMk`

uses different optimization routines
internally by default) so I recommend using with caution!

## No comments:

## Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.