Wednesday, July 27, 2022

Parallelized Mk model-fitting actually does run faster than serial fitMk, for higher dimensional optimization

A couple of months ago I posted code for a paralellized version of phytools::fitMk using optimParallel. Unfortunately, in the test example I provided, fitMk.parallel ran just barely faster than fitMk!

After playing around with parallel computing and make.simmap yesterday, I decided to revisit fitMk.parallel, which is now in the GitHub development version of phytools.

First, I tried reiterating the same analysis of my previous post.

## load packages
library(phytools)
library(geiger)
## check phytools verison number
packageVersion("phytools")
## [1] '1.1.3'
## read tree and data from file
lizard.tree<-read.nexus(file=
    "http://www.phytools.org/Rbook/7/lizard_tree.nex")
lizard.tree
## 
## Phylogenetic tree with 4162 tips and 4161 internal nodes.
## 
## Tip labels:
##   Sphenodon_punctatus, Dibamus_bourreti, Dibamus_greeri, Dibamus_montanus, Anelytropsis_papillosus, Dibamus_tiomanensis, ...
## 
## Rooted; includes branch lengths.
## read data from file
lizard.data<-read.csv(file=
    "http://www.phytools.org/Rbook/7/lizard_spines.csv",
    stringsAsFactors=TRUE,row.names=1)
head(lizard.data)
##                                   habitat tail.spines
## Ablepharus_budaki          non-saxicolous   non-spiny
## Abronia_anzuetoi           non-saxicolous   non-spiny
## Abronia_frosti             non-saxicolous   non-spiny
## Acanthodactylus_aureus     non-saxicolous   non-spiny
## Acanthodactylus_opheodurus non-saxicolous   non-spiny
## Acanthodactylus_pardalis   non-saxicolous   non-spiny
## check names
chk<-name.check(lizard.tree,lizard.data)
summary(chk)
## 3504 taxa are present in the tree but not the data:
##     Ablepharus_chernovi,
##     Ablepharus_kitaibelii,
##     Ablepharus_pannonicus,
##     Abronia_aurita,
##     Abronia_campbelli,
##     Abronia_chiszari,
##     ....
## 
## To see complete list of mis-matched taxa, print object.
## prune tree to include only taxa in data
lizard.tree<-drop.tip(lizard.tree,chk$tree_not_data)
## extract discrete character
habitat<-setNames(lizard.data$habitat,
    rownames(lizard.data))
head(habitat)
##          Ablepharus_budaki           Abronia_anzuetoi 
##             non-saxicolous             non-saxicolous 
##             Abronia_frosti     Acanthodactylus_aureus 
##             non-saxicolous             non-saxicolous 
## Acanthodactylus_opheodurus   Acanthodactylus_pardalis 
##             non-saxicolous             non-saxicolous 
## Levels: non-saxicolous saxicolous
## run fitMk.parallel
system.time(
    fit.optimParallel<-fitMk.parallel(lizard.tree,habitat,
        model="ARD",pi="fitzjohn",ncores=8)
)
##    user  system elapsed 
##    0.92    0.03    8.75
fit.optimParallel
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##                non-saxicolous saxicolous
## non-saxicolous      -0.001552   0.001552
## saxicolous           0.022388  -0.022388
## 
## Fitted (or set) value of pi:
## non-saxicolous     saxicolous 
##        0.03763        0.96237 
## due to treating the root prior as (a) nuisance.
## 
## Log-likelihood: -216.724702 
## 
## Optimization method used was "optimParallel"
system.time(
    fit.nlminb<-fitMk(lizard.tree,habitat,model="ARD",
        pi="fitzjohn")
)
##    user  system elapsed 
##   11.00    0.19   11.19
fit.nlminb
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##                non-saxicolous saxicolous
## non-saxicolous      -0.001552   0.001552
## saxicolous           0.022388  -0.022388
## 
## Fitted (or set) value of pi:
## non-saxicolous     saxicolous 
##        0.03763        0.96237 
## due to treating the root prior as (a) nuisance.
## 
## Log-likelihood: -216.724702 
## 
## Optimization method used was "nlminb"

We see that although the same results are obtained, and fitMk.parallel runs slightly faster, there's barely any difference between the two functions!

After reading a bit more about the optimParallel package, however, it occurred to me that the gains from parallelization should vary as a function of the dimensionality (that is, parameter complexity) of the optimization problem. This is because the number of gradient calculations in serial optim scales as a linear function of the number of parameters to be estimated. The speed-up of optimParallel is accomplished by distributing these calculations across processors. If our model involves relatively few parameters, as in the preceding example, we should probably not expect a large effect!

To investigate this hypothesis, I decided to simply simulate data for a multi-state, rather than binary, trait on my original lizard tree, try to fit a model to the simulated data using both fitMk.parallel and fit.Mk, then compare the results.

I'll use a random Q matrix for simulation, as follows.

Q<-matrix(rexp(n=4,rate=100),4,4,byrow=TRUE,
    dimnames=list(letters[1:4],letters[1:4]))
Q[lower.tri(Q)]<-t(Q)[lower.tri(Q)]
diag(Q)<-0
diag(Q)<--rowSums(Q)
class(Q)<-"Qmatrix"
plot(Q)
title(main="Generating Q matrix",font.main=3)

plot of chunk unnamed-chunk-3

x<-sim.Mk(lizard.tree,Q)
head(x)
##  Dibamus_greeri Orraya_occultus Nephrurus_amyae Nephrurus_levis 
##               b               d               d               d 
##   Lialis_jicari  Aprasia_repens 
##               a               a 
## Levels: a b c d

Now I'm ready to run my analysis. I'll wrap each function call with system.time so that I can extract the timings of the optimization.

t1<-system.time(
    fit.optimParallel<-fitMk.parallel(lizard.tree,x,
        model="SYM",pi="fitzjohn",ncores=8)
)
t1
##    user  system elapsed 
##    1.33    0.11   40.53
fit.optimParallel
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##           a         b         c         d
## a -0.006285  0.001570  0.000740  0.003975
## b  0.001570 -0.005601  0.000000  0.004032
## c  0.000740  0.000000 -0.005513  0.004773
## d  0.003975  0.004032  0.004773 -0.012780
## 
## Fitted (or set) value of pi:
##        a        b        c        d 
## 0.910297 0.033306 0.003123 0.053275 
## due to treating the root prior as (a) nuisance.
## 
## Log-likelihood: -438.818235 
## 
## Optimization method used was "optimParallel"
t2<-system.time(
    fit.nlminb<-fitMk(lizard.tree,x,model="SYM",
        pi="fitzjohn")
)
t2
##    user  system elapsed 
##   82.25    1.75   84.14
fit.nlminb
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##           a         b         c         d
## a -0.006285  0.001570  0.000740  0.003975
## b  0.001570 -0.005601  0.000000  0.004032
## c  0.000740  0.000000 -0.005513  0.004773
## d  0.003975  0.004032  0.004773 -0.012781
## 
## Fitted (or set) value of pi:
##        a        b        c        d 
## 0.910298 0.033305 0.003122 0.053275 
## due to treating the root prior as (a) nuisance.
## 
## Log-likelihood: -438.818233 
## 
## Optimization method used was "nlminb"

Finally, let's graph the results.

par(mfrow=c(1,2))
plot(as.Qmatrix(fit.optimParallel),show.zeros=FALSE,tol=1e-4)
mtext(paste("a) fitMk.parallel: time = ",round(t1[3],1),"s; log(L) = ",
    round(logLik(fit.optimParallel),2),sep=""),adj=0.05,
    line=-1)
plot(as.Qmatrix(fit.nlminb),show.zeros=FALSE,tol=1e-4)
mtext(paste("b) fitMk:time = ",round(t2[3],1),"s; log(L) = ",
    round(logLik(fit.nlminb),2),sep=""),adj=0.05,
    line=-1)

plot of chunk unnamed-chunk-6

A final note. Even though fitMk.parallel seems much better than fitMk here, it has not been as thoroughly tested (and fitMk uses different optimization routines internally by default) so I recommend using with caution!

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.