Wednesday, February 28, 2024

Using differential evolution algorithms to optimize an ordered Mk model where the rate of change between states varies as an n-degree polynomial functional form

The other week I posted about fitting an ordered Mk model in which the rate of change between adjacent character levels in the model varied as a n-degree polynomial functional form.

This model is very cool (IMO); however, in practice it can be quite difficult to fit to data. I’m not sure why this is, but I suspect it has to do with the fact that the functional form can easily get the Q matrix into parts of parameter space where the likelihood is not defined.

I was searching around for a robust optimization routine in R that might work for “holey” likelihood surfaces that were not numerically differentiable & I stumbled onto DEoptim. DEoptim uses the differential evolution algorithm described in Storn & Price (1997; also see Mullen et al. 2011).

Differential evolution is a type of evolutionary algorithm in which the principles of natural selection are used to try to solve a complex problem – in this case, optimization of a real-valued function. The algorithm first creates a population of “individuals” consisting of randomly chosen values of the parameters to be optimized. Each generation, this population is subject to selection (based on the objective function: in our case, the likelihood function), mutation, and crossing-over. Differential evolution is robust, but inefficient for smooth objective functions. So far, I’ve found that it seems quite good at converging to the same or a better solution that I’m able to find using nlminb or optim.

To illustrate it’s use, I’m going to use some real empirical data for a (potentially) ordered character trait: caudal vertebrae number in Anolis. These data are from an article I have in review that was co-authored by Klaus Schliep, Travis Ingram, & Luke Mahler. (These data are available from a project GitHub repository here.)

To start with, I’ll load phytools – which should be updated from GitHub.

library(phytools)
packageVersion("phytools")
## [1] '2.2.0'

Next, we can read in our dataset and tree.

anolis_dat<-read.csv(file="anolis_caudal_vertebrae_data.csv",
  row.names=1)
head(anolis_dat)
anolis_tre<-read.tree(file="mcc.tre")
anolis_tre
## 
## Phylogenetic tree with 187 tips and 186 internal nodes.
## 
## Tip labels:
##   punctatus, transversalis, agassizi, casildae, microtus, aqbl, ...
## 
## Rooted; includes branch lengths.

There are some discrepancies between the data and tree, so I’ll use geiger::name.check to identify reconcile them.

chk<-geiger::name.check(anolis_tre,anolis_dat,data.names=anolis_dat$Species)
summary(chk)
## 76 taxa are present in the tree but not the data:
##     acutus,
##     aeneus,
##     alayoni,
##     alfaroi,
##     altae,
##     altitudinalis,
##     ....
## 
## To see complete list of mis-matched taxa, print object.
anolis_tre.pruned<-drop.tip(anolis_tre,chk$tree_not_data)
vert_number<-setNames(anolis_dat$Vert,anolis_dat$Species)

Finally, we’re ready to fit our model.

Here, I’m going to use fitfnMk with opt.method="DEoptim" and degree=0 to indicate that I want to fit a model with constant (i.e., degree 0 polynomial) backward & forward transition rates between adjacent levels. I could’ve also fit this model easily using phytools::fitMk or geiger::fitDiscrete. I’m using a modest number of iterations of maxit=100, simply because I expect thsi model will be straightforward to optimize.

anolis_fit0<-fitfnMk(anolis_tre.pruned,vert_number,
  degree=0,niter=1,opt.method="DEoptim",parallel=TRUE,
  maxit=100,trace=10,prompt=FALSE)
## 
## Iteration: 10 bestvalit: 303.445488 bestmemit:    0.129787    0.215553
## Iteration: 20 bestvalit: 303.441723 bestmemit:    0.130566    0.217604
## Iteration: 30 bestvalit: 303.441716 bestmemit:    0.130487    0.217581
## Iteration: 40 bestvalit: 303.441716 bestmemit:    0.130491    0.217592
## Iteration: 50 bestvalit: 303.441716 bestmemit:    0.130492    0.217592
## Iteration: 60 bestvalit: 303.441716 bestmemit:    0.130491    0.217592
## Iteration: 70 bestvalit: 303.441716 bestmemit:    0.130491    0.217592
## Iteration: 80 bestvalit: 303.441716 bestmemit:    0.130491    0.217592
## Iteration: 90 bestvalit: 303.441716 bestmemit:    0.130491    0.217592
## Iteration: 100 bestvalit: 303.441716 bestmemit:    0.130491    0.217592

Let’s graph our fitted model and the log-likelihood.

plot(as.Qmatrix(anolis_fit0),xlim=c(-1.6,1),ylim=c(-0.9,0.9),
  show.zeros=FALSE,width=TRUE,color=TRUE,text=FALSE,max.lwd=4)
mtext(paste("log(L) =",round(logLik(anolis_fit0),2)),
  line=2,adj=0.1,cex=1.5)

plot of chunk unnamed-chunk-49

Terrific. Now, let’s do the same for a slightly more interesting model – this one a first degree polynomial, i.e., “straight line.”

anolis_fit1<-fitfnMk(anolis_tre.pruned,vert_number,
  degree=1,niter=1,opt.method="DEoptim",parallel=TRUE,
  maxit=200,trace=20,prompt=FALSE)
## 
## Iteration: 20 bestvalit: 303.594425 bestmemit:   -0.001227    0.167693    0.000076    0.236956
## Iteration: 40 bestvalit: 303.181487 bestmemit:   -0.006270    0.207564   -0.006498    0.294802
## Iteration: 60 bestvalit: 303.176395 bestmemit:   -0.006165    0.202675   -0.006765    0.295882
## Iteration: 80 bestvalit: 303.176297 bestmemit:   -0.006164    0.202479   -0.006795    0.295932
## Iteration: 100 bestvalit: 303.176295 bestmemit:   -0.006169    0.202606   -0.006796    0.296011
## Iteration: 120 bestvalit: 303.176295 bestmemit:   -0.006169    0.202600   -0.006795    0.296001
## Iteration: 140 bestvalit: 303.176295 bestmemit:   -0.006169    0.202599   -0.006795    0.296001
## Iteration: 160 bestvalit: 303.176295 bestmemit:   -0.006169    0.202599   -0.006795    0.296001
## Iteration: 180 bestvalit: 303.176295 bestmemit:   -0.006169    0.202599   -0.006795    0.296001
## Iteration: 200 bestvalit: 303.176295 bestmemit:   -0.006169    0.202599   -0.006795    0.296001

This time, instead of just plotting the fitted Q matrix, let’s plot both the Q matrix and the function form. This will look as follows.

layout(matrix(c(1,2),1,2),widths=c(0.45,0.55))
plot(as.Qmatrix(anolis_fit1),width=TRUE,color=TRUE,
  show.zeros=FALSE,xlim=c(-1.6,1),ylim=c(-0.9,0.9),
  text=FALSE,max.lwd=4)
mtext(paste("log(L) =",round(logLik(anolis_fit1),2)),
  line=2,adj=0.1,cex=1.5)
par(mar=c(3.1,4.1,1.1,1.1))
plot(anolis_fit1)

plot of chunk unnamed-chunk-51

anolis_fit2<-fitfnMk(anolis_tre.pruned,vert_number,
  degree=2,niter=1,opt.method="DEoptim",parallel=TRUE,
  maxit=400,trace=40,prompt=FALSE)
## 
## Iteration: 40 bestvalit: 313.032922 bestmemit:    0.007247   -0.049616    0.154917    0.014953   -0.157637    0.311038
## Iteration: 80 bestvalit: 302.032363 bestmemit:    0.002105   -0.029666    0.310493    0.004510   -0.069509    0.462582
## Iteration: 120 bestvalit: 298.087849 bestmemit:   -0.003116    0.101355   -0.598268    0.001908   -0.022558    0.194169
## Iteration: 160 bestvalit: 297.922071 bestmemit:   -0.002958    0.095977   -0.538803    0.002085   -0.023139    0.184739
## Iteration: 200 bestvalit: 297.920032 bestmemit:   -0.002985    0.097075   -0.547527    0.002109   -0.023431    0.185733
## Iteration: 240 bestvalit: 297.920013 bestmemit:   -0.002987    0.097064   -0.547306    0.002102   -0.023354    0.185633
## Iteration: 280 bestvalit: 297.920013 bestmemit:   -0.002987    0.097054   -0.547246    0.002101   -0.023351    0.185630
## Iteration: 320 bestvalit: 297.920013 bestmemit:   -0.002987    0.097053   -0.547236    0.002101   -0.023351    0.185630
## Iteration: 360 bestvalit: 297.920013 bestmemit:   -0.002987    0.097053   -0.547237    0.002101   -0.023351    0.185630
## Iteration: 400 bestvalit: 297.920013 bestmemit:   -0.002987    0.097053   -0.547237    0.002101   -0.023351    0.185630
layout(matrix(c(1,2),1,2),widths=c(0.45,0.55))
plot(as.Qmatrix(anolis_fit2),width=TRUE,color=TRUE,
  show.zeros=FALSE,xlim=c(-1.6,1),ylim=c(-0.9,0.9),
  text=FALSE,max.lwd=4)
mtext(paste("log(L) =",round(logLik(anolis_fit2),2)),
  line=2,adj=0.1,cex=1.5)
par(mar=c(3.1,4.1,1.1,1.1))
plot(anolis_fit2)

plot of chunk unnamed-chunk-53

anolis_fit3<-fitfnMk(anolis_tre.pruned,vert_number,
  degree=3,niter=1,opt.method="DEoptim",parallel=TRUE,
  maxit=800,trace=80,prompt=FALSE)
## 
## Iteration: 80 bestvalit: 323.114084 bestmemit:   -0.110058    2.594987    0.095398   -0.658322   -0.023410    1.320606    2.973144   -1.786844
## Iteration: 160 bestvalit: 322.561268 bestmemit:   -0.046732    1.078965   -0.505863    0.012502   -0.018492    0.666359    0.283562   -0.283691
## Iteration: 240 bestvalit: 316.330584 bestmemit:   -0.007572    0.226626   -0.990072    1.418284   -0.004184    0.190996   -1.052335    1.798670
## Iteration: 320 bestvalit: 313.424801 bestmemit:   -0.003558    0.092464   -0.335300    0.466350   -0.000780    0.042978   -0.138476    0.294190
## Iteration: 400 bestvalit: 306.819655 bestmemit:   -0.001681    0.053889   -0.410016    1.077750   -0.000730    0.031338   -0.249775    0.728293
## Iteration: 480 bestvalit: 303.042333 bestmemit:    0.000301    0.002289   -0.115700    0.694865    0.001343   -0.026053    0.108531    0.314291
## Iteration: 560 bestvalit: 297.490606 bestmemit:   -0.000109    0.005058   -0.043357    0.146974    0.000344   -0.005590    0.017234    0.189912
## Iteration: 640 bestvalit: 297.296921 bestmemit:   -0.000019    0.002161   -0.022959    0.112327    0.000420   -0.008811    0.045689    0.140619
## Iteration: 720 bestvalit: 297.287727 bestmemit:   -0.000036    0.002814   -0.030054    0.134835    0.000411   -0.008379    0.040328    0.158209
## Iteration: 800 bestvalit: 297.287689 bestmemit:   -0.000036    0.002815   -0.030075    0.134978    0.000412   -0.008407    0.040547    0.157920
layout(matrix(c(1,2),1,2),widths=c(0.45,0.55))
plot(as.Qmatrix(anolis_fit3),width=TRUE,color=TRUE,
  show.zeros=FALSE,xlim=c(-1.6,1),ylim=c(-0.9,0.9),
  text=FALSE,max.lwd=4)
mtext(paste("log(L) =",round(logLik(anolis_fit3),2)),
  line=2,adj=0.1,cex=1.5)
par(mar=c(3.1,4.1,1.1,1.1))
plot(anolis_fit3)

plot of chunk unnamed-chunk-55

Readers might have noted that I increased maxit (and trace – but that is just my print interval & will not affect optimization) for each increasingly complex model. There’s no reason to have done it in such a systematic manner, but it does take longer to converge (on average) for more parameter rich models of this type.

Finally, let’s compare our set of models as follows.

anova(anolis_fit0,
  anolis_fit1,
  anolis_fit2,
  anolis_fit3)
##                log(L) d.f.      AIC    weight
## anolis_fit0 -303.4417    2 610.8834 0.1444457
## anolis_fit1 -303.1763    4 614.3526 0.0254910
## anolis_fit2 -297.9200    6 607.8400 0.6615641
## anolis_fit3 -297.2877    8 610.5754 0.1684992

This tells us that the best supported model (accounting for parameter complexity) is actually our second-degree (i.e., quadratic) polynomial model. Cool.

We, of course, can consider higher order polynomial models – but these take even longer to optimize, so I may reserve that for a future post!

That’s it.