Sunday, September 4, 2022

Parallelized multirateBM for variable-rate Brownian evolution model

Last year I published an article describing a penalized likelihood approach to fitting an evolutionary model to quantitative trait data in which the rate of evolution itself, \(\sigma^{2}\), evolved under a geometric Brownian motion process, implemented in the phytools function multirateBM.

Even though I think this is a pretty neat model, one of its limitations is that penalized likelihood optimization is very slow.

To try to help with that, I recently pushed a phytools update that adds parallelized numerical optimization to multirateBM using the cool CRAN package optimParallel.

Let's see how it works.

Why don't we start by loading a tree and dataset. In this case, I'll use a tree and body size dataset for 49 species of mammals from Garland et al. (1992).

library(phytools)
packageVersion("phytools")
## [1] '1.2.2'
data(mammal.tree)
data(mammal.data)
lnMass<-setNames(log(mammal.data$bodyMass),
    rownames(mammal.data))

OK. We're ready to test out the parallelized version of multirateBM.

Unlike, fitMk.parallel, which I blogged about this summer, parallelization is not implemented as a separate function in multirateBM, but as a function option: parallel=TRUE.

Let's see.

First, I'll fit my model using parallel=FALSE, the default. Then I'll repeat with parallel=TRUE.

Even though multiple optimization methods are available in multirateBM, I'll stick to "L-BFGS-B" (a box constrained, quasi-Newton optimization routine), because that's what is implemented in optimParallel.

Finally, I'll wrap each of my optimization chunks in system.time so that we can get a timing from our R system.

system.time(
    fit.serial<-multirateBM(mammal.tree,lnMass,
        lambda=1,optim="L-BFGS-B",parallel=FALSE)
)
## Beginning optimization....
## Optimization iteration 1. Using "L-BFGS-B" optimization method.
## Best (penalized) log-likelihood so far: -230.967 
## Done optimization.
##    user  system elapsed 
##   89.30    0.92   90.28
fit.serial
## Multi-rate Brownian model using multirateBM.
## 
## Fitted rates:
##  U._maritimus U._arctos U._americanus N._narica P._lotor M._mephitis
##      0.013659  0.013007      0.126694  0.024988 0.017302    0.153454
##  M._meles C._lupus     
##  0.011781 0.178708 ....
## 
## lambda penalty term: 1
## log-likelihood:  -57.689982 
## AIC:  311.379964 
## 
## R thinks it has found a solution.
system.time(
    fit.parallel<-multirateBM(mammal.tree,lnMass,
        lambda=1,optim="L-BFGS-B",parallel=TRUE)
)
## Beginning optimization....
## Using socket cluster with 8 nodes on host 'localhost'.
## Optimization iteration 1. Using "L-BFGS-B" (parallel) optimization method.
## Best (penalized) log-likelihood so far: -230.967 
## Done optimization.
##    user  system elapsed 
##    1.19    1.04   48.95
fit.parallel
## Multi-rate Brownian model using multirateBM.
## 
## Fitted rates:
##  U._maritimus U._arctos U._americanus N._narica P._lotor M._mephitis
##      0.013659  0.013007      0.126694  0.024988 0.017302    0.153454
##  M._meles C._lupus     
##  0.011781 0.178708 ....
## 
## lambda penalty term: 1
## log-likelihood:  -57.689982 
## AIC:  311.379964 
## 
## R thinks it has found a solution.

Neat. We should see that we've obtained exactly the same solutions from both parallel=FALSE and parallel=TRUE, but the latter was produced in around half the time on my computer. (Results will vary, obviously!)

The reader also may have noted that I didn't specify the number of cores to use. I could have, using the optional argument ncores, but if I don't, then multirateBM will try to figure out how many cores I have using parallel::detectCores, and then print the results to screen: Using socket cluster with 8 nodes on host ‘localhost’.

Let's plot our fitted model!

## generate plotting object
obj<-plot(fit.parallel,plot=FALSE)
obj
## Object of class "multirateBM_plot" containing:
## 
## (1) A phylogenetic tree with 49 tips and 48 internal nodes.
## 
## (2) A mapped set of Brownian evolution rates.
## change color (needs viridisLite package)
obj<-setMap(obj,viridisLite::viridis(n=8))
## graph
plot(obj,ftype="i",fsize=0.8,outline=TRUE,lwd=5)

plot of chunk unnamed-chunk-3

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.