Sunday, December 22, 2024

Some updates to fitfnMk for fitting an ordered Mk model in which the rates of change between adjacent character levels vary according to an n-degree polynomial functional form

Since earlier this year, phytools has included a function (called fitfnMk) for fitting an ordered Mk discrete trait evolution model in which the rate of change between adjacent levels of our character vary one from the other according to an n-degree polynomial functional form.

This functional form can be different in the backward & forward directions of character’s evolution, and can even have different degrees in those two directions. As the degree of our polynomial goes to \(k-2\) (for k levels of our trait) in both the forward & backward directions, then it should have the ordered ARD model as a special case.

Over the past couple of days, I’ve been tweaking both the optimization of the model and the plotting method for the fitted model object from this method. (I started with the plot method but then discovered that model optimization, which I knew was difficult, was even worse than I thought!)

I’d previously discovered that optimization could be quite a challenging problem for this model, no doubt because for lots of random values of the polynomial coefficients the likelihood function becomes undefined (e.g., because all transition rates between states are negative). For that reason, I’d already tried various alternative optimization approaches, such as the differential evolution algorithm. (Differential evolution seems to work pretty well, but is extremely slow.)

The optimization updates that I’ve recently pushed to the phytools GitHub mostly involve an option for the starting values of optimization that I’m calling start = "really smart". If users select this option, the function first fits an all-rates-different ordered model with \(2k-2\) parameters (for \(k\) levels of the trait), then proceeds to obtain starting values for optimization by fitting an n-degree polynomial through the estimated rates.

I don’t have a formal proof that this will give us starting values near the ML solution (except for \(n = k-2\), in which it must – that is, assuming that the ordered ARD model indeed converged to the right solution); however, I suspect it gets us quite close under broad circumstances.

As mentioned above, I also updated plotting of this model which can be done using an S3 generic plot function for the "fitfnMk" object class. Indeed, this was my original aim when I opened up the source code of this function.

To see how the old plot method worked, I recommend checking out this post.

In my recent updates, I decided to take the rather dramatic step of first flipping x and y axes of the plot.

I’m not sure why I thought this would be better, but it logically allows us to represent the (forward) transitions in one direction on the left vertical axis and the (bacward) transitions in the other direction under our model on the right vertical axis.

In the same vein, then, I proceeded to flip the direction of the x axis for the backwards (as opposed to forwards rates).

Lastly, instead of fitting a polynomial functional form to the rates, but then graphing the rates and not the polynomial function itself, the updated function also interpolates values between the plotted rates under the fitted function form. I think this looks better.

I think what I’m describing should be obvious as soon as you see the graphs.

To illustrate it, then, I’m going to start by loading an updated version of phytools installed from GitHub.

## load library
library(phytools)
packageVersion("phytools")
## [1] '2.3.28'

Next, I’m going to pull data for squamate digit number from a study by Brandley et al. (2008) from our book website. I don’t have any particular reason to imagine that trait evolution for this character takes a polynomial function form, but it’s a convenient multi-state ordered character, so why not?

## read & clean data
library(geiger)
sqTree<-read.nexus(
  file="http://www.phytools.org/Rbook/6/squamate.tre")
sqData<-read.csv(
  file="http://www.phytools.org/Rbook/6/squamate-data.csv",
  row.names=1)
toes<-as.factor(setNames(sqData[[1]],rownames(sqData)))
chk<-name.check(sqTree,sqData)
summary(chk)
## 139 taxa are present in the tree but not the data:
##     Abronia_graminea,
##     Acontias_litoralis,
##     Acontophiops_lineatus,
##     Acrochordus_granulatus,
##     Agamodon_anguliceps,
##     Agkistrodon_contortrix,
##     ....
## 1 taxon is present in the data but not the tree:
##     Trachyboa_boulengeri
## 
## To see complete list of mis-matched taxa, print object.
sqTree<-drop.tip(sqTree,chk$tree_not_data)
toes<-toes[sqTree$tip.label]
summary(toes)
##  0  1  2  3  4  5 
## 25 16  5  4  6 63

At this point, we’re ready to fit our models. I’ll start with a degree 0 polynomial. This is an “intercept only” model, so our backward & forward rates are each a (potentially different) constant. Since this model converges to the MLE quite easily, I’ll just run 5 optimization iterations using fitfnMk, as follows. Note that this is exactly the same as fitting an ordered model with one forward rate and a second backward rate. If I’d manually fit that model using fitMk instead (and it converged), I should obtain the same rates, log-likelihood, and model parameter complexity.

## fit degree=0 polynomial
sq.fit0<-fitfnMk(sqTree,toes,degree=0,niter=5,
  pi="fitzjohn")
## log-likelihood from current iteration: -164.2052 
##  --- Best log-likelihood so far: -164.2052 ---
## log-likelihood from current iteration: -164.2052 
##  --- Best log-likelihood so far: -164.2052 ---
## log-likelihood from current iteration: -164.2052 
##  --- Best log-likelihood so far: -164.2052 ---
## log-likelihood from current iteration: -164.2052 
##  --- Best log-likelihood so far: -164.2052 ---
## log-likelihood from current iteration: -164.2052 
##  --- Best log-likelihood so far: -164.2052 ---
sq.fit0
## Object of class "fitfnMk".
## 
## Fitted (or set) value of Q:
##           0         1         2         3         4         5
## 0 -0.037551  0.037551  0.000000  0.000000  0.000000  0.000000
## 1  0.023253 -0.060803  0.037551  0.000000  0.000000  0.000000
## 2  0.000000  0.023253 -0.060803  0.037551  0.000000  0.000000
## 3  0.000000  0.000000  0.023253 -0.060803  0.037551  0.000000
## 4  0.000000  0.000000  0.000000  0.023253 -0.060803  0.037551
## 5  0.000000  0.000000  0.000000  0.000000  0.023253 -0.023253
## 
## Fitted (or set) value of pi:
##        0        1        2        3        4        5 
## 0.149488 0.176177 0.196163 0.186328 0.156696 0.135149 
## due to treating the root prior as (a) nuisance.
## 
## Log-likelihood: -164.205161 
## 
## Optimization method used was "nlminb"
## 
## R thinks it has found the ML solution.
plot(sq.fit0,args.legend=list(x="bottomleft"))

plot of chunk unnamed-chunk-6

(I promise the plot method looks cooler for higher degree polynomials!)

Next, I’ll fit a degree 1 polynomial.

This is just a linear functional form in each of the two directions of forward & backward change. Note that rates below 0.0 are not meaningful, so these are set to zero in our optimization routine (and also shown as such on the graph.)

## fit degree=1 polynomial
sq.fit1<-fitfnMk(sqTree,toes,degree=1,start="really smart",
  parallel=TRUE,niter=20,pi="fitzjohn")
## Opened cluster with 15 cores.
## Running optimization iterations in parallel.
## Please wait....
sq.fit1
## Object of class "fitfnMk".
## 
## Fitted (or set) value of Q:
##          0         1         2         3         4         5
## 0 0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
## 1 0.020293 -0.020293  0.000000  0.000000  0.000000  0.000000
## 2 0.000000  0.050908 -0.059591  0.008683  0.000000  0.000000
## 3 0.000000  0.000000  0.081524 -0.479787  0.398263  0.000000
## 4 0.000000  0.000000  0.000000  0.112139 -0.899983  0.787844
## 5 0.000000  0.000000  0.000000  0.000000  0.142755 -0.142755
## 
## Fitted (or set) value of pi:
##        0        1        2        3        4        5 
## 0.000000 0.000000 0.005673 0.263089 0.359298 0.371939 
## due to treating the root prior as (a) nuisance.
## 
## Log-likelihood: -120.18589 
## 
## Optimization method used was "nlminb"
## 
## R thinks it has found the ML solution.
sq.fit1
## Object of class "fitfnMk".
## 
## Fitted (or set) value of Q:
##          0         1         2         3         4         5
## 0 0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
## 1 0.020293 -0.020293  0.000000  0.000000  0.000000  0.000000
## 2 0.000000  0.050908 -0.059591  0.008683  0.000000  0.000000
## 3 0.000000  0.000000  0.081524 -0.479787  0.398263  0.000000
## 4 0.000000  0.000000  0.000000  0.112139 -0.899983  0.787844
## 5 0.000000  0.000000  0.000000  0.000000  0.142755 -0.142755
## 
## Fitted (or set) value of pi:
##        0        1        2        3        4        5 
## 0.000000 0.000000 0.005673 0.263089 0.359298 0.371939 
## due to treating the root prior as (a) nuisance.
## 
## Log-likelihood: -120.18589 
## 
## Optimization method used was "nlminb"
## 
## R thinks it has found the ML solution.
plot(sq.fit1,args.legend=list(x="bottomleft"))

plot of chunk unnamed-chunk-9

We can do the same for a 2nd degree polynomial. (This is just a parabolic form – remember high school algebra & geometry?)

## fit degree=2 polynomial (in parallel)
sq.fit2<-fitfnMk(sqTree,toes,degree=2,start="really smart",
  parallel=TRUE,niter=20,pi="fitzjohn")
## Opened cluster with 15 cores.
## Running optimization iterations in parallel.
## Please wait....
sq.fit2
## Object of class "fitfnMk".
## 
## Fitted (or set) value of Q:
##          0         1         2         3         4         5
## 0 0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
## 1 0.022031 -0.022031  0.000000  0.000000  0.000000  0.000000
## 2 0.000000  0.057285 -0.067957  0.010672  0.000000  0.000000
## 3 0.000000  0.000000  0.066411 -0.123271  0.056860  0.000000
## 4 0.000000  0.000000  0.000000  0.049410 -0.097308  0.047898
## 5 0.000000  0.000000  0.000000  0.000000  0.006281 -0.006281
## 
## Fitted (or set) value of pi:
##        0        1        2        3        4        5 
## 0.000000 0.000000 0.001625 0.072269 0.287181 0.638925 
## due to treating the root prior as (a) nuisance.
## 
## Log-likelihood: -116.624062 
## 
## Optimization method used was "nlminb"
## 
## R thinks it has found the ML solution.
plot(sq.fit2,args.legend=list(x="bottomleft"))

plot of chunk unnamed-chunk-12

Now a 3rd degree polynomial.

## fit degree=3 polynomial
sq.fit3<-fitfnMk(sqTree,toes,degree=3,start="really smart",
  parallel=TRUE,niter=20,pi="fitzjohn")
## Opened cluster with 15 cores.
## Running optimization iterations in parallel.
## Please wait....
sq.fit3
## Object of class "fitfnMk".
## 
## Fitted (or set) value of Q:
##          0         1         2         3         4         5
## 0 0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
## 1 0.023484 -0.023484  0.000000  0.000000  0.000000  0.000000
## 2 0.000000  0.084105 -0.087560  0.003454  0.000000  0.000000
## 3 0.000000  0.000000  0.068749 -0.068749  0.000000  0.000000
## 4 0.000000  0.000000  0.000000  0.026334 -0.065539  0.039205
## 5 0.000000  0.000000  0.000000  0.000000  0.005777 -0.005777
## 
## Fitted (or set) value of pi:
##        0        1        2        3        4        5 
## 0.000000 0.000000 0.000000 0.000000 0.272485 0.727515 
## due to treating the root prior as (a) nuisance.
## 
## Log-likelihood: -115.272036 
## 
## Optimization method used was "nlminb"
## 
## R thinks optimization may not have converged.
plot(sq.fit3,args.legend=list(x="bottomleft"))

plot of chunk unnamed-chunk-15

(R thinks we might have reason to be concerned about whether or not we've converged on the ML solution, but let's ignore that for now.)

Finally, let’s fit our fourth degree polynomial functional form as follows. This model actually has the same number of parameters as our ordered ARD model, which it should also have as a special case. (That is, there should exist a pair of degree 4 polynomials that each passes through all of the backward and forward rates in our ARD model!)

## degree=4
sq.fit4<-fitfnMk(sqTree,toes,degree=4,start="really smart",
  parallel=TRUE,niter=20,pi="fitzjohn")
## Opened cluster with 15 cores.
## Running optimization iterations in parallel.
## Please wait....
sq.fit4
## Object of class "fitfnMk".
## 
## Fitted (or set) value of Q:
##          0         1         2           3         4         5
## 0 0.000000  0.000000    0.0000    0.000000  0.000000  0.000000
## 1 0.022288 -0.022288    0.0000    0.000000  0.000000  0.000000
## 2 0.000000  0.066210 -587.1601  587.093850  0.000000  0.000000
## 3 0.000000  0.000000  732.3557 -732.355684  0.000000  0.000000
## 4 0.000000  0.000000    0.0000    0.026384 -0.066295  0.039910
## 5 0.000000  0.000000    0.0000    0.000000  0.005813 -0.005813
## 
## Fitted (or set) value of pi:
##        0        1        2        3        4        5 
## 0.000000 0.000000 0.000000 0.000000 0.275171 0.724829 
## due to treating the root prior as (a) nuisance.
## 
## Log-likelihood: -114.967191 
## 
## Optimization method used was "nlminb"
## 
## R thinks optimization may not have converged.
plot(sq.fit4)

plot of chunk unnamed-chunk-18

Once again, R thinks optimization may not have converged. In fact, this isn't too surprising because (as we'll see in a sec.) the back & forth transition rates in this model between character levels 2 and 3 seem to be able to take almost arbitrarily high values with almost no effect on the likelihood -- as such, even though we may not have found the MLE, it's entirely possible that our estimate of Q may not be materially different from the genuine MLE.

Comparing amongst models is easy. It actually tells us the (somewhat interesting?) thing that the best supported model (controlling for parameter complexity) is our degree 2 model with a total of 6 estimated parameters.

## compare models
anova(sq.fit0,sq.fit1,sq.fit2,sq.fit3,sq.fit4)
##            log(L) d.f.      AIC       weight
## sq.fit0 -164.2052    2 332.4103 6.468195e-20
## sq.fit1 -120.1859    4 248.3718 1.146888e-01
## sq.fit2 -116.6241    6 245.2481 5.467816e-01
## sq.fit3 -115.2720    8 246.5441 2.860239e-01
## sq.fit4 -114.9672   10 249.9344 5.250566e-02

fitfnMk also allows us to fit models with different polynomial degrees in the two different backward & forward directions.

Just to try this out, why don’t we fit a model with degree 1 (i.e., linearly increasing or decreasing rates) in the forward (i.e., digit gain) direction, and a quadratic (i.e., degree 2) form in the backward (i.e., digit loss) direction.

sq.fit1_2<-fitfnMk(sqTree,toes,degree=c(1,2),
  start="really smart",parallel=TRUE,niter=20,
  pi="fitzjohn")
## Opened cluster with 15 cores.
## Running optimization iterations in parallel.
## Please wait....
sq.fit1_2
## Object of class "fitfnMk".
## 
## Fitted (or set) value of Q:
##          0         1         2         3         4         5
## 0 0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
## 1 0.023284 -0.023284  0.000000  0.000000  0.000000  0.000000
## 2 0.000000  0.056746 -0.066718  0.009972  0.000000  0.000000
## 3 0.000000  0.000000  0.065073 -0.095235  0.030163  0.000000
## 4 0.000000  0.000000  0.000000  0.048264 -0.098618  0.050354
## 5 0.000000  0.000000  0.000000  0.000000  0.006321 -0.006321
## 
## Fitted (or set) value of pi:
##        0        1        2        3        4        5 
## 0.000000 0.000000 0.000651 0.033087 0.272421 0.693841 
## due to treating the root prior as (a) nuisance.
## 
## Log-likelihood: -116.74947 
## 
## Optimization method used was "nlminb"
## 
## R thinks it has found the ML solution.
plot(sq.fit1_2,args.legend=list(x="bottomleft"))

plot of chunk unnamed-chunk-22

As I mentioned at the outset, our our polynomial degree \(k-2\) model should be equivalent to the ordered all-rates-different model, inasmuch as their should exist an order \(k-2\) polynomial that goes through the MLE rates. Let’s fit that model to see.

ordered<-matrix(c(
  0,1,0,0,0,0,
  6,0,2,0,0,0,
  0,7,0,3,0,0,
  0,0,8,0,4,0,
  0,0,0,9,0,5,
  0,0,0,0,10,0),6,6,byrow=TRUE,
  dimnames=list(levels(toes),levels(toes)))
library(foreach)
library(doParallel)
niter<-20
ncores<-10
mc<-makeCluster(ncores,type="PSOCK")
registerDoParallel(cl=mc)
ard_fits<-foreach(i=1:niter)%dopar%{
  phytools::fitMk(sqTree,toes,model=ordered,
    rand_start=TRUE,pi="fitzjohn")
}
stopCluster(cl=mc)
lnL<-sapply(ard_fits,logLik)
lnL
##  [1] -116.7172 -116.7767 -116.7754 -117.4238 -117.5884 -115.2658 -114.9673 -116.7761
##  [9] -116.7771 -120.7367 -116.6915 -114.9933 -122.3595 -117.0183 -133.2131 -117.1285
## [17] -115.1086 -117.6960 -116.7753 -115.2773
ordered_ard<-ard_fits[[which(lnL==max(lnL))]]
ordered_ard
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##          0         1         2           3         4         5
## 0 0.000000  0.000000    0.0000    0.000000  0.000000  0.000000
## 1 0.022288 -0.022288    0.0000    0.000000  0.000000  0.000000
## 2 0.000000  0.066426 -470.0745  470.008104  0.000000  0.000000
## 3 0.000000  0.000000  582.0953 -582.095292  0.000000  0.000000
## 4 0.000000  0.000000    0.0000    0.026384 -0.066295  0.039911
## 5 0.000000  0.000000    0.0000    0.000000  0.005813 -0.005813
## 
## Fitted (or set) value of pi:
##        0        1        2        3        4        5 
## 0.000000 0.000000 0.000000 0.000000 0.275174 0.724826 
## due to treating the root prior as (a) nuisance.
## 
## Log-likelihood: -114.967288 
## 
## Optimization method used was "nlminb"
## 
## R thinks optimization may not have converged.
plot(sq.fit4,args.legend=list(x=FALSE))
lines(ordered_ard$rates[1:5],
  seq(0.4,by=1,length.out=5),type="b",
  pch=22,cex=1.5,lty="dotted",col="blue")
par(usr=par()$usr[c(2,1,3,4)])
lines(ordered_ard$rates[1:5+5],
  seq(0.6,by=1,length.out=5),type="b",
  pch=22,cex=1.5,lty="dotted",col="red")

plot of chunk unnamed-chunk-27

Once again, we didn't find the MLE -- but we can see that our likelihood is equivalent to the degree 4 polynomial, as are the rates of change between states: apart from the back & forth transition rates between levels 2 and 3 which seem to be unstable but high across all models.

Last of all, let’s simulate a tree and some data that actually arose under an ordered process in which the rate of change between adjacent levels is well captured by (say) a 3rd degree polynomial – just to see if we can estimate the generating model from our data! (This will also create a nice plot when I tweet about this update… ha, ha.)

My simulation is based on an early post (here), so I won’t re-explain it.

First, here's a visualization of our generating model.

k<-10
x<-0:(k-2)+0.5
q1<-0.08333*x^3-1.25*x^2+5.16667*x-1
q2<--0.08333*x^3+1.25*x^2-5.16667*x+9
par(mar=c(5.1,4.1,2.1,1.1))
plot(x,q1,type="b",col="blue",bty="n",las=1,
  axes=FALSE,xlab="",ylab="transition rate (q)",
  ylim=c(0,max(c(q1,q2))))
lines(x,q2,type="b",col="red")
labs<-mapply(function(x,y) bquote(.(x) %<->% .(y)),
  x=0:(k-2),y=1:(k-1))
axis(1,at=seq(0.5,k-1.5,by=1),labels=rep("",k-1))
nulo<-mapply(mtext,text=labs,at=seq(0.5,k-1.5,by=1),
  MoreArgs=list(side=1,line=1,las=3,cex=0.7))
axis(2,las=1,cex.axis=0.8)
grid()
legend("bottomleft",c("forward","backward"),
  col=c("blue","red"),
  lty="solid",pch=1,cex=0.8,title="generating rates")

plot of chunk unnamed-chunk-28

Now let's simulate some data.

Q<-matrix(0,k,k,dimnames=list(0:(k-1),0:(k-1)))
Q[cbind(1:(k-1),2:k)]<-q1
Q[cbind(2:k,1:(k-1))]<-q2
diag(Q)<--rowSums(Q)
tree<-pbtree(n=1000,scale=10)
y<-sim.Mk(tree,Q,anc=as.character(sample(0:9,1)))
summary(y)
##   0   1   2   3   4   5   6   7   8   9 
##  98  27  40  97 159 246 166 101  39  27

We can now fit our model. This may take a while!

sim_fit3<-fitfnMk(tree,to.matrix(y,0:9),degree=3,
  pi="fitzjohn",niter=10,start="really smart",
  parallel=TRUE)
## Opened cluster with 10 cores.
## Running optimization iterations in parallel.
## Please wait....

Finally, let’s plot our results & overlay the generating rates.

plot(sim_fit3,mar=c(5.1,3.1,3.1,3.1),
  args.legend=list(x="topleft"))
legend("topright","generating rates",lty="dotted",
  pch=22,pt.cex=1.5,pt.bg="white")
lines(q1[1:9],seq(0.4,by=1,length.out=9),type="b",
  pch=22,cex=1.5,lty="dotted",col="blue")
par(usr=par()$usr[c(2,1,3,4)])
lines(q2[1:9],seq(0.6,by=1,length.out=9),type="b",
  pch=22,cex=1.5,lty="dotted",col="red")

plot of chunk unnamed-chunk-31

par(usr=par()$usr[c(1,2,3,4)])

Neat.

As a small addendum, I don't think anyone would pre-suppose that the transition rates between adjacent pairs of states genuinely vary amongst each other as, say, a quadratic polynomial. On the other hand, many biologists might hypothesize that these rates are non-independent or correlated with each other. The idea of using an n-degree polynomial functional form is merely just a phenomenological way of capturing and testing this intuition.

That’s all folks!

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.