The other week I posted about fitting an ordered Mk model in which the rate of change between adjacent character levels in the model varied as a n-degree polynomial functional form.
Still working out some bugs, but now in #Rstats #phytools -- fitting an ordered Mk model in which the rates of change between adjacent states vary according to an n-degree polynomial functional form: https://t.co/DcS0GFtALg. pic.twitter.com/FhlTQsASCR
— Liam Revell (@phytools_liam) February 21, 2024
This model is very cool (IMO); however, in practice it can be quite difficult to fit to data. I’m not sure why this is, but I suspect it has to do with the fact that the functional form can easily get the Q matrix into parts of parameter space where the likelihood is not defined.
I was searching around for a robust optimization routine in R that might work for “holey” likelihood surfaces that were not numerically differentiable & I stumbled onto DEoptim. DEoptim uses the differential evolution algorithm described in Storn & Price (1997; also see Mullen et al. 2011).
Differential evolution is a type of evolutionary algorithm in which the principles of natural selection are used to try to solve a complex problem – in this case, optimization of a real-valued function. The algorithm first creates a population of “individuals” consisting of randomly chosen values of the parameters to be optimized. Each generation, this population is subject to selection (based on the objective function: in our case, the likelihood function), mutation, and crossing-over. Differential evolution is robust, but inefficient for smooth objective functions. So far, I’ve found that it seems quite good at converging to the same or a better solution that I’m able to find using nlminb
or optim
.
To illustrate it’s use, I’m going to use some real empirical data for a (potentially) ordered character trait: caudal vertebrae number in Anolis. These data are from an article I have in review that was co-authored by Klaus Schliep, Travis Ingram, & Luke Mahler. (These data are available from a project GitHub repository here.)
To start with, I’ll load phytools – which should be updated from GitHub.
library(phytools)
packageVersion("phytools")
## [1] '2.2.0'
Next, we can read in our dataset and tree.
anolis_dat<-read.csv(file="anolis_caudal_vertebrae_data.csv",
row.names=1)
head(anolis_dat)
anolis_tre<-read.tree(file="mcc.tre")
anolis_tre
##
## Phylogenetic tree with 187 tips and 186 internal nodes.
##
## Tip labels:
## punctatus, transversalis, agassizi, casildae, microtus, aqbl, ...
##
## Rooted; includes branch lengths.
There are some discrepancies between the data and tree, so I’ll use geiger::name.check
to identify reconcile them.
chk<-geiger::name.check(anolis_tre,anolis_dat,data.names=anolis_dat$Species)
summary(chk)
## 76 taxa are present in the tree but not the data:
## acutus,
## aeneus,
## alayoni,
## alfaroi,
## altae,
## altitudinalis,
## ....
##
## To see complete list of mis-matched taxa, print object.
anolis_tre.pruned<-drop.tip(anolis_tre,chk$tree_not_data)
vert_number<-setNames(anolis_dat$Vert,anolis_dat$Species)
Finally, we’re ready to fit our model.
Here, I’m going to use fitfnMk
with opt.method="DEoptim"
and degree=0
to indicate that I want to fit a model with constant (i.e., degree 0 polynomial) backward & forward transition rates between adjacent levels. I could’ve also fit this model easily using phytools::fitMk
or geiger::fitDiscrete
. I’m using a modest number of iterations of maxit=100
, simply because I expect thsi model will be straightforward to optimize.
anolis_fit0<-fitfnMk(anolis_tre.pruned,vert_number,
degree=0,niter=1,opt.method="DEoptim",parallel=TRUE,
maxit=100,trace=10,prompt=FALSE)
##
## Iteration: 10 bestvalit: 303.445488 bestmemit: 0.129787 0.215553
## Iteration: 20 bestvalit: 303.441723 bestmemit: 0.130566 0.217604
## Iteration: 30 bestvalit: 303.441716 bestmemit: 0.130487 0.217581
## Iteration: 40 bestvalit: 303.441716 bestmemit: 0.130491 0.217592
## Iteration: 50 bestvalit: 303.441716 bestmemit: 0.130492 0.217592
## Iteration: 60 bestvalit: 303.441716 bestmemit: 0.130491 0.217592
## Iteration: 70 bestvalit: 303.441716 bestmemit: 0.130491 0.217592
## Iteration: 80 bestvalit: 303.441716 bestmemit: 0.130491 0.217592
## Iteration: 90 bestvalit: 303.441716 bestmemit: 0.130491 0.217592
## Iteration: 100 bestvalit: 303.441716 bestmemit: 0.130491 0.217592
Let’s graph our fitted model and the log-likelihood.
plot(as.Qmatrix(anolis_fit0),xlim=c(-1.6,1),ylim=c(-0.9,0.9),
show.zeros=FALSE,width=TRUE,color=TRUE,text=FALSE,max.lwd=4)
mtext(paste("log(L) =",round(logLik(anolis_fit0),2)),
line=2,adj=0.1,cex=1.5)
Terrific. Now, let’s do the same for a slightly more interesting model – this one a first degree polynomial, i.e., “straight line.”
anolis_fit1<-fitfnMk(anolis_tre.pruned,vert_number,
degree=1,niter=1,opt.method="DEoptim",parallel=TRUE,
maxit=200,trace=20,prompt=FALSE)
##
## Iteration: 20 bestvalit: 303.594425 bestmemit: -0.001227 0.167693 0.000076 0.236956
## Iteration: 40 bestvalit: 303.181487 bestmemit: -0.006270 0.207564 -0.006498 0.294802
## Iteration: 60 bestvalit: 303.176395 bestmemit: -0.006165 0.202675 -0.006765 0.295882
## Iteration: 80 bestvalit: 303.176297 bestmemit: -0.006164 0.202479 -0.006795 0.295932
## Iteration: 100 bestvalit: 303.176295 bestmemit: -0.006169 0.202606 -0.006796 0.296011
## Iteration: 120 bestvalit: 303.176295 bestmemit: -0.006169 0.202600 -0.006795 0.296001
## Iteration: 140 bestvalit: 303.176295 bestmemit: -0.006169 0.202599 -0.006795 0.296001
## Iteration: 160 bestvalit: 303.176295 bestmemit: -0.006169 0.202599 -0.006795 0.296001
## Iteration: 180 bestvalit: 303.176295 bestmemit: -0.006169 0.202599 -0.006795 0.296001
## Iteration: 200 bestvalit: 303.176295 bestmemit: -0.006169 0.202599 -0.006795 0.296001
This time, instead of just plotting the fitted Q matrix, let’s plot both the Q matrix and the function form. This will look as follows.
layout(matrix(c(1,2),1,2),widths=c(0.45,0.55))
plot(as.Qmatrix(anolis_fit1),width=TRUE,color=TRUE,
show.zeros=FALSE,xlim=c(-1.6,1),ylim=c(-0.9,0.9),
text=FALSE,max.lwd=4)
mtext(paste("log(L) =",round(logLik(anolis_fit1),2)),
line=2,adj=0.1,cex=1.5)
par(mar=c(3.1,4.1,1.1,1.1))
plot(anolis_fit1)
anolis_fit2<-fitfnMk(anolis_tre.pruned,vert_number,
degree=2,niter=1,opt.method="DEoptim",parallel=TRUE,
maxit=400,trace=40,prompt=FALSE)
##
## Iteration: 40 bestvalit: 313.032922 bestmemit: 0.007247 -0.049616 0.154917 0.014953 -0.157637 0.311038
## Iteration: 80 bestvalit: 302.032363 bestmemit: 0.002105 -0.029666 0.310493 0.004510 -0.069509 0.462582
## Iteration: 120 bestvalit: 298.087849 bestmemit: -0.003116 0.101355 -0.598268 0.001908 -0.022558 0.194169
## Iteration: 160 bestvalit: 297.922071 bestmemit: -0.002958 0.095977 -0.538803 0.002085 -0.023139 0.184739
## Iteration: 200 bestvalit: 297.920032 bestmemit: -0.002985 0.097075 -0.547527 0.002109 -0.023431 0.185733
## Iteration: 240 bestvalit: 297.920013 bestmemit: -0.002987 0.097064 -0.547306 0.002102 -0.023354 0.185633
## Iteration: 280 bestvalit: 297.920013 bestmemit: -0.002987 0.097054 -0.547246 0.002101 -0.023351 0.185630
## Iteration: 320 bestvalit: 297.920013 bestmemit: -0.002987 0.097053 -0.547236 0.002101 -0.023351 0.185630
## Iteration: 360 bestvalit: 297.920013 bestmemit: -0.002987 0.097053 -0.547237 0.002101 -0.023351 0.185630
## Iteration: 400 bestvalit: 297.920013 bestmemit: -0.002987 0.097053 -0.547237 0.002101 -0.023351 0.185630
layout(matrix(c(1,2),1,2),widths=c(0.45,0.55))
plot(as.Qmatrix(anolis_fit2),width=TRUE,color=TRUE,
show.zeros=FALSE,xlim=c(-1.6,1),ylim=c(-0.9,0.9),
text=FALSE,max.lwd=4)
mtext(paste("log(L) =",round(logLik(anolis_fit2),2)),
line=2,adj=0.1,cex=1.5)
par(mar=c(3.1,4.1,1.1,1.1))
plot(anolis_fit2)
anolis_fit3<-fitfnMk(anolis_tre.pruned,vert_number,
degree=3,niter=1,opt.method="DEoptim",parallel=TRUE,
maxit=800,trace=80,prompt=FALSE)
##
## Iteration: 80 bestvalit: 323.114084 bestmemit: -0.110058 2.594987 0.095398 -0.658322 -0.023410 1.320606 2.973144 -1.786844
## Iteration: 160 bestvalit: 322.561268 bestmemit: -0.046732 1.078965 -0.505863 0.012502 -0.018492 0.666359 0.283562 -0.283691
## Iteration: 240 bestvalit: 316.330584 bestmemit: -0.007572 0.226626 -0.990072 1.418284 -0.004184 0.190996 -1.052335 1.798670
## Iteration: 320 bestvalit: 313.424801 bestmemit: -0.003558 0.092464 -0.335300 0.466350 -0.000780 0.042978 -0.138476 0.294190
## Iteration: 400 bestvalit: 306.819655 bestmemit: -0.001681 0.053889 -0.410016 1.077750 -0.000730 0.031338 -0.249775 0.728293
## Iteration: 480 bestvalit: 303.042333 bestmemit: 0.000301 0.002289 -0.115700 0.694865 0.001343 -0.026053 0.108531 0.314291
## Iteration: 560 bestvalit: 297.490606 bestmemit: -0.000109 0.005058 -0.043357 0.146974 0.000344 -0.005590 0.017234 0.189912
## Iteration: 640 bestvalit: 297.296921 bestmemit: -0.000019 0.002161 -0.022959 0.112327 0.000420 -0.008811 0.045689 0.140619
## Iteration: 720 bestvalit: 297.287727 bestmemit: -0.000036 0.002814 -0.030054 0.134835 0.000411 -0.008379 0.040328 0.158209
## Iteration: 800 bestvalit: 297.287689 bestmemit: -0.000036 0.002815 -0.030075 0.134978 0.000412 -0.008407 0.040547 0.157920
layout(matrix(c(1,2),1,2),widths=c(0.45,0.55))
plot(as.Qmatrix(anolis_fit3),width=TRUE,color=TRUE,
show.zeros=FALSE,xlim=c(-1.6,1),ylim=c(-0.9,0.9),
text=FALSE,max.lwd=4)
mtext(paste("log(L) =",round(logLik(anolis_fit3),2)),
line=2,adj=0.1,cex=1.5)
par(mar=c(3.1,4.1,1.1,1.1))
plot(anolis_fit3)
Readers might have noted that I increased maxit
(and trace
– but that is just my print interval & will not affect optimization) for each increasingly complex model. There’s no reason to have done it in such a systematic manner, but it does take longer to converge (on average) for more parameter rich models of this type.
Finally, let’s compare our set of models as follows.
anova(anolis_fit0,
anolis_fit1,
anolis_fit2,
anolis_fit3)
## log(L) d.f. AIC weight
## anolis_fit0 -303.4417 2 610.8834 0.1444457
## anolis_fit1 -303.1763 4 614.3526 0.0254910
## anolis_fit2 -297.9200 6 607.8400 0.6615641
## anolis_fit3 -297.2877 8 610.5754 0.1684992
This tells us that the best supported model (accounting for parameter complexity) is actually our second-degree (i.e., quadratic) polynomial model. Cool.
We, of course, can consider higher order polynomial models – but these take even longer to optimize, so I may reserve that for a future post!
That’s it.