Saturday, April 1, 2023

Parallelizing multiple iterations of ML optimization (of the Mk or other phylogenetic models) using foreach

Here’s a quick demo on how to use foreach (from the foreach package) to run multiple likelihood optimization iterations in parallel.

library(foreach)
library(doParallel)

For this example, I’ll focus on the fitpolyMk polymorphic trait evolution model & will re-use the dataset for habitat use polymorphism in Mycalesina butterflies from a study by Halali et al (2020).

Let’s load phytools and our data. To follow along, users should make sure that they update phytools from GitHub.

library(phytools)
data("butterfly.tree")
butterfly.tree
## 
## Phylogenetic tree with 287 tips and 286 internal nodes.
## 
## Tip labels:
##   Bra_peitho_gigas, Bra_decira, Bra_simonsii, Bra_perspicua, Bra_phaea, Tel_adolphei, ...
## 
## Rooted; includes branch lengths.
data("butterfly.data")
head(butterfly.data)
##                                     habitat
## Myc_francisca_formosana? forest+fringe+open
## Bic_cooksoni                           open
## Bic_brunnea                          forest
## Bic_jefferyi                    fringe+open
## Bic_auricruda_fulgida                forest
## Bic_smithi_smithi             forest+fringe
habitat<-setNames(butterfly.data$habitat,
  rownames(butterfly.data))

Next, I’ll set up a parallel socket cluster using the parallel package. I’ll assign it ten cores, even though I have 16 on my machine, just because I’m only going to do ten optimization iterations.

niter<-10
ncores<-min(niter,detectCores())
mc<-makeCluster(ncores,type="PSOCK")
registerDoParallel(cl=mc)

Now let’s run our optimization iterations in parallel using foreach which works (kind of) like a for loop, but in which iteration of the loop is run on a separate instance of R.

butterfly_fits<-foreach(i=1:niter)%dopar%{
  phytools::fitpolyMk(butterfly.tree,habitat,model="ARD",
    ordered=TRUE,order=c("forest","fringe","open"),
    pi="fitzjohn",rand_start=TRUE)
}

We should not forget to stop the cluster we created.

stopCluster(cl=mc)

Let’s look at our log-likelihoods across optimization iterations.

This should show us that we obtained different solutions from different instances of our optimization. It’s likely that we converged on the same best solution in multiple optimization iterations.

logL<-sapply(butterfly_fits,logLik)
logL
##  [1] -296.4731 -296.1568 -296.2729 -301.7469 -296.1568 -301.7469 -304.1142 -296.1568 -313.1329
## [10] -296.2814

Picking our best solution (hopefully the true MLE, although this is unknowable) involves identifying the optimization with the highest log-likelihood. (If multiple optimization iterations have the same log-likelihood, we’ll just take the first of these.)

best_fit<-butterfly_fits[[which(logL==max(logL))[1]]]

Let’s plot our fitted model.

plot(best_fit,mar=rep(0.1,4))

plot of chunk unnamed-chunk-8

Finally, marginal ancestral states.

butterfly_ancr<-ancr(best_fit)
butterfly_ancr
## Marginal ancestral state estimates:
##       forest forest+fringe forest+fringe+open   fringe fringe+open open
## 288 0.000071      0.007656           0.992197 0.000013    0.000063    0
## 289 0.000065      0.010537           0.989369 0.000006    0.000023    0
## 290 0.000810      0.151475           0.847090 0.000161    0.000464    0
## 291 0.000793      0.155105           0.840370 0.002053    0.001680    0
## 292 0.016801      0.570957           0.396180 0.006652    0.009411    0
## 293 0.917097      0.079702           0.003198 0.000000    0.000003    0
## ...
## 
## Log-likelihood = -296.156846
cols<-setNames(c(rgb(0,1,0),rgb(0,0.5,0.5),rgb(1/3,1/3,1/3),
	rgb(0,0,1),rgb(0.5,0.5,0),rgb(1,0,0)),levels(habitat))
plotTree(butterfly.tree,type="fan",lwd=1,ftype="off")
nodelabels(pie=butterfly_ancr$ace,piecol=cols,cex=0.4)
legend(x="topleft",legend=levels(habitat),pt.cex=1.2,pch=15,
	col=cols,cex=0.8,bty="n",ncol=2)
tiplabels(pie=best_fit$data[butterfly.tree$tip.label,],
  piecol=cols,cex=0.25)

plot of chunk unnamed-chunk-10

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.