Saturday, December 16, 2023

A new phylogenetic model of discrete phenotypic trait evolution with Γ-distributed rate heterogeneity among branches

Last week I wrote various posts (1, 2, 3, 4, 5) about fitting and analyzing a model of discrete character evolution with \(\Gamma\) distributed rate heterogeneity among edges.

I had to set it aside for other things, but after pushing a few updates & improvements to GitHub this morning, I thought I’d circle back & illustrate a full empirical example using the evolution of digit number in squamates: the same example that I considered in Chapter 6 of my book last year with Luke Harmon.

To start with, I’m going to load phytools.

Remember, to follow along at this point you’ll need to make sure that you update phytools from GitHub which can be done using (e.g.) the package devtools (for example, by running devtools::install_github("liamrevell/phytools") in an interactive R session with devtools installed.)

library(phytools)
## Loading required package: ape
## Loading required package: maps
packageVersion("phytools")
## [1] '2.0.13'

Having loaded phytools, let’s get our data & tree.

These can be obtained from the book website and are modified from results that were originally published by Brandley et al. (2008).

squamate.tree<-read.nexus(
  file="http://www.phytools.org/Rbook/6/squamate.tre")
squamate.data<-read.csv(
  file="http://www.phytools.org/Rbook/6/squamate-data.csv",
  row.names=1)

As we showed in the book, there are some mismatches between the tree & data – so we’ll use the geiger::name.check function to reconcile the two.

library(geiger)
chk<-name.check(squamate.tree,squamate.data)
summary(chk)
## 139 taxa are present in the tree but not the data:
##     Abronia_graminea,
##     Acontias_litoralis,
##     Acontophiops_lineatus,
##     Acrochordus_granulatus,
##     Agamodon_anguliceps,
##     Agkistrodon_contortrix,
##     ....
## 1 taxon is present in the data but not the tree:
##     Trachyboa_boulengeri
## 
## To see complete list of mis-matched taxa, print object.
squamate.tree<-drop.tip(squamate.tree,
  chk$tree_not_data)
toe_number<-as.factor(setNames(squamate.data[[1]],
  rownames(squamate.data)))
toe_number<-toe_number[squamate.tree$tip.label]

Next, I’ll load some other packages that I intend to use to parallelize multiple optimization iterations.

Fortunately, if you have a current version of phytools, you’ll already have all these other packages installed.

library(foreach)
library(parallel)
library(doParallel)
## Loading required package: iterators

Now, in the book we showed that out of a standard set of homogeneous rate Mk models ("ER", "SYM", "ARD", a loss-only directional model, & a loss-and-gain ordered model) the directional and ordered models were overwhelmingly better-supported than the other three. As such, let’s take these two models as our starting point then go ahead & fit them.

Because of the difficulty of optimizing some (though not all) Mk models in R, I’m going to use the foreach package to run multiple, parallelized optimization iterations from different random starting values for each model.

## number of optimization iterations
niter<-20

To use foreach, we must first create a parallel socket cluster.

These are multiple copies of R that are all open simultaneously, each running a separate one of optimization iterations.

(In fact, you can see this if you open Task Manager in Windows while your cluster is running.)

When setting up our cluster, we can use parallel::detectCores to figure out how many cores our computer has. As a general rule, we shouldn’t use all of them (the code below will leave at least two free), but there’s also no point in opening more parallel R sessions than we have optimization iterations to run!

## set number of cores to use
ncores<-min(c(parallel::detectCores()-2,
  niter))
## create parallel socket cluster
mc<-makeCluster(ncores,type="PSOCK")
registerDoParallel(cl=mc)

Now let’s fit our directional (digit loss only) Mk model with niter=20 optimization iterations.

To do so, we should first create a design matrix for the model. This will tell our optimizer (in this case, phytools::fitMk) which rates to estimate, and which to disallow or set to zero.

## design matrix
Directional<-matrix(c(
  0,0,0,0,0,0,
  1,0,0,0,0,0,
  0,2,0,0,0,0,
  0,0,3,0,0,0,
  0,0,0,4,0,0,
  0,0,0,0,5,0),6,6,byrow=TRUE,
  dimnames=list(levels(toe_number),
    levels(toe_number)))
Directional
##   0 1 2 3 4 5
## 0 0 0 0 0 0 0
## 1 1 0 0 0 0 0
## 2 0 2 0 0 0 0
## 3 0 0 3 0 0 0
## 4 0 0 0 4 0 0
## 5 0 0 0 0 5 0

We’re at last ready to start our parallelized optimization. Using foreach::foreach, that looks as follows.

fits<-foreach(i=1:niter)%dopar%{
  phytools::fitMk(squamate.tree,toe_number,
    model=Directional,pi="fitzjohn",rand_start=TRUE)
}

I’ll now extract and print out the log-likelihoods of each optimization.

logL<-sapply(fits,logLik)
logL
##  [1] -118.3870 -118.6303 -118.5285 -118.3870 -118.3870 -118.3870 -118.3870 -118.3870 -140.3878
## [10] -118.3870 -118.8699 -118.9325 -121.0795 -118.3870 -162.1625 -118.3870 -118.5332 -118.3870
## [19] -118.3870 -118.3870

Most likely, a number of these converged to the same value. Hopefully this is our ML solution for this model.

(As a sidenote, because there’s some computational cost in opening our cluster and communicating between the nodes, running 20 optimizations in parallel is slower than running just one in serial – though much faster than running 20 in serial, of course – so we may only want to do this for models that are difficult to optimize.)

Let’s find the best of this set.

directional_mk<-fits[[which(logL==max(logL))[1]]]
print(directional_mk,digits=4)
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##        0       1       2       3       4       5
## 0 0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
## 1 0.0242 -0.0242  0.0000  0.0000  0.0000  0.0000
## 2 0.0000  0.1031 -0.1031  0.0000  0.0000  0.0000
## 3 0.0000  0.0000  0.1092 -0.1092  0.0000  0.0000
## 4 0.0000  0.0000  0.0000  0.0833 -0.0833  0.0000
## 5 0.0000  0.0000  0.0000  0.0000  0.0044 -0.0044
## 
## Fitted (or set) value of pi:
## 0 1 2 3 4 5 
## 0 0 0 0 0 1 
## due to treating the root prior as (a) nuisance.
## 
## Log-likelihood: -118.387 
## 
## Optimization method used was "nlminb"
## 
## R thinks it has found the ML solution.

Now we can proceed and do exactly the same thing, but for our ordered, non-directional (gain-and-loss) model.

This time we don’t need to open a new cluster – we already have the one we created earlier! (We should, however, remember to close it when we no longer need it or else all those parallel R sessions will just stay open, running in the background & using memory, until we shut down R.)

## design matrix
Ordered<-matrix(c(
  0,6,0,0,0,0,
  1,0,7,0,0,0,
  0,2,0,8,0,0,
  0,0,3,0,9,0,
  0,0,0,4,0,10,
  0,0,0,0,5,0),6,6,byrow=TRUE,
  dimnames=list(levels(toe_number),
    levels(toe_number)))
Ordered
##   0 1 2 3 4  5
## 0 0 6 0 0 0  0
## 1 1 0 7 0 0  0
## 2 0 2 0 8 0  0
## 3 0 0 3 0 9  0
## 4 0 0 0 4 0 10
## 5 0 0 0 0 5  0
fits<-foreach(i=1:niter)%dopar%{
  phytools::fitMk(squamate.tree,toe_number,
    model=Ordered,pi="fitzjohn",rand_start=TRUE)
}
logL<-sapply(fits,logLik)
logL
##  [1] -116.7767 -116.7753 -119.9617 -123.3256 -118.4797 -121.3946 -116.7760 -114.9687 -116.7753
## [10] -123.6688 -130.4098 -127.8928 -114.9672 -115.0745 -117.3701 -116.4219 -117.4540 -114.9676
## [19] -114.9672 -123.8975
ordered_mk<-fits[[which(logL==max(logL))[1]]]
print(ordered_mk,digits=4)
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##        0       1         2         3       4       5
## 0 0.0000  0.0000    0.0000    0.0000  0.0000  0.0000
## 1 0.0223 -0.0223    0.0000    0.0000  0.0000  0.0000
## 2 0.0000  0.0661 -750.4793  750.4132  0.0000  0.0000
## 3 0.0000  0.0000  939.2954 -939.2954  0.0000  0.0000
## 4 0.0000  0.0000    0.0000    0.0264 -0.0663  0.0399
## 5 0.0000  0.0000    0.0000    0.0000  0.0058 -0.0058
## 
## Fitted (or set) value of pi:
##      0      1      2      3      4      5 
## 0.0000 0.0000 0.0000 0.0000 0.2752 0.7248 
## due to treating the root prior as (a) nuisance.
## 
## Log-likelihood: -114.9672 
## 
## Optimization method used was "nlminb"
## 
## R thinks it has found the ML solution.

Let’s plot & compare our two models so far.

par(mfrow=c(1,2))
plot(directional_mk,color=TRUE,width=TRUE,
  offset=0.04,show.zeros=FALSE,xlim=c(-1.5,1),
  mar=c(0.1,0.1,1.1,0.1))
mtext(paste("a) directional model; log(L) = ",
  round(logLik(directional_mk),2)),line=0,adj=0)
plot(ordered_mk,color=TRUE,width=TRUE,
  offset=0.04,show.zeros=FALSE,xlim=c(-1.5,1),
  mar=c(0.1,0.1,1.1,0.1))
mtext(paste("b) ordered model; log(L) = ",
  round(logLik(ordered_mk),2)),line=0,adj=0)

plot of chunk unnamed-chunk-16

anova(directional_mk,ordered_mk)
##                   log(L) d.f.      AIC    weight
## directional_mk -118.3870    5 246.7741 0.8292236
## ordered_mk     -114.9672   10 249.9344 0.1707764

This tells us that, of these two models, the best-supported is our directional model: in which digits can be lost by squamates, but not regained.

We shouldn’t be too preoccupied with the very high transition rates between states 3 and 2 in our ordered model. This likely indicates that these two states are effectively random with respect to each other (though not overall!) on the phylogeny, which is consistent with a high rate of back-and-forth change between the states. How high is probably hard to say!

Having done that, we should proceed and allow the rate of character evolution to vary from branch to branch under our new model of \(\Gamma\) distributed rate heterogeneity among edges of the tree.

To do that, I’m going to repeat my parallelization code of earlier using foreach, but this time substitute the new phytools function fitgammaMk. Having explored these data a bit already, I’ll update the minimum value of the \(\alpha\) shape parameter of our \(\Gamma\) distribution to min.alpha=0.01. Lastly, I’m going to set nrates (the number of rate categories in our discretized \(\Gamma\)) to nrates=6. (This is arbitrary, but it is something that we could choose based on model fit.)

fits<-foreach(i=1:niter)%dopar%{
  phytools::fitgammaMk(squamate.tree,toe_number,
    model=Directional,pi="fitzjohn",rand_start=TRUE,
    nrates=6,min.alpha=0.01)
}
logL<-sapply(fits,logLik)
logL
##  [1] -113.1218 -114.3462 -114.3462 -113.1218 -113.1218 -113.1218 -113.1218 -113.1218 -113.1218
## [10] -114.3462 -113.1218 -113.1218 -113.1218 -113.1218 -113.1218 -113.1218 -113.1218 -113.1218
## [19] -113.1218 -113.1218
directional_mk.gamma<-fits[[which(logL==max(logL))[1]]]
print(directional_mk.gamma,digits=4)
## Object of class "fitgammaMk".
## 
## Fitted (or set) value of Q:
##        0       1       2       3       4       5
## 0 0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
## 1 0.0119 -0.0119  0.0000  0.0000  0.0000  0.0000
## 2 0.0000  0.0278 -0.0278  0.0000  0.0000  0.0000
## 3 0.0000  0.0000  0.0313 -0.0313  0.0000  0.0000
## 4 0.0000  0.0000  0.0000  0.0276 -0.0276  0.0000
## 5 0.0000  0.0000  0.0000  0.0000  0.0148 -0.0148
## 
## Fitted (or set) value of alpha rate heterogeneity
## (with 6 rate categories): 0.0681
## 
## Fitted (or set) value of pi:
## 0 1 2 3 4 5 
## 0 0 0 0 0 1 
## due to treating the root prior as (a) nuisance.
## 
## Log-likelihood: -113.1218 
## 
## Optimization method used was "nlminb"
## 
## R thinks it has found the ML solution.

This gives us our optimized directional (digit loss only) model, but in which the rate of evolution is permitted to vary from edge to edge according to a model of \(\Gamma\) distributed rate heterogeneity. Before we evaluate whether or not we think that this has substantially improved the fit of our model to these data, let’s do the same thing for our ordered model in which digits are both lost and gained, though in a sequential fashion.

fits<-foreach(i=1:niter)%dopar%{
  phytools::fitgammaMk(squamate.tree,toe_number,
    model=Ordered,pi="fitzjohn",rand_start=TRUE,
    nrates=6,min.alpha=0.01)
}
logL<-sapply(fits,logLik)
logL
##  [1] -120.9345 -113.1219 -111.2750 -111.2750 -113.1218 -114.3464 -113.1219 -113.1219 -113.1218
## [10] -111.2750 -113.0791 -111.2750 -111.2750 -113.0791 -113.1219 -113.0792 -113.0792 -113.1218
## [19] -113.1220 -113.0791
ordered_mk.gamma<-fits[[which(logL==max(logL))[1]]]
print(ordered_mk.gamma,digits=4)
## Object of class "fitgammaMk".
## 
## Fitted (or set) value of Q:
##        0       1       2       3       4       5
## 0 0.0000  0.0000  0.0000  0.0000  0.0000  0.0000
## 1 0.0171 -0.0171  0.0000  0.0000  0.0000  0.0000
## 2 0.0000  0.0604 -0.0604  0.0000  0.0000  0.0000
## 3 0.0000  0.0000  0.0887 -0.0887  0.0000  0.0000
## 4 0.0000  0.0000  0.0000  0.0573 -0.2074  0.1500
## 5 0.0000  0.0000  0.0000  0.0000  0.0159 -0.0159
## 
## Fitted (or set) value of alpha rate heterogeneity
## (with 6 rate categories): 0.2892
## 
## Fitted (or set) value of pi:
##      0      1      2      3      4      5 
## 0.0000 0.0000 0.0000 0.0000 0.2193 0.7807 
## due to treating the root prior as (a) nuisance.
## 
## Log-likelihood: -111.275 
## 
## Optimization method used was "nlminb"
## 
## R thinks optimization may not have converged.

As before, we can plot these…. When we do, we should first convert each of our fitted models to a phytools object of class "Qmatrix" (using phytools::as.Qmatrix) – otherwise the plot call will try to plot our marginal edge rates, which we have not (yet) computed!

par(mfrow=c(1,2))
plot(as.Qmatrix(directional_mk.gamma),color=TRUE,
  width=TRUE,offset=0.04,show.zeros=FALSE,xlim=c(-1.5,1),
  mar=c(0.1,0.1,1.1,0.1))
tmp<-bquote("a) directional + "~Gamma~
    " model; log(L)"==.(round(logLik(directional_mk.gamma),2)))
mtext(tmp,line=0,adj=0)
plot(as.Qmatrix(ordered_mk.gamma),color=TRUE,width=TRUE,
  offset=0.04,show.zeros=FALSE,xlim=c(-1.5,1),
  mar=c(0.1,0.1,1.1,0.1))
tmp<-bquote("b) ordered + "~Gamma~
    " model; log(L)"==.(round(logLik(ordered_mk.gamma),2)))
mtext(tmp,line=0,adj=0)

plot of chunk unnamed-chunk-22

(As an aside, and note to my future self, the most complicated – and potentially useful – part of the code above is the use of bquote to combine Greek letters and variable values into the same legend. Thanks to this blog post for the refresher!)

An interesting observation about the fitted Q matrices, above, is how similar they are, one to the other, now that we’re allowing the rate of evolution to vary from branch to branch.

OK. Let’s also kill our cluster now.

stopCluster(cl=mc)

We can go ahead and compare all four of our fitted models so far….

anova(directional_mk,
  directional_mk.gamma,
  ordered_mk,
  ordered_mk.gamma)
##                         log(L) d.f.      AIC      weight
## directional_mk       -118.3870    5 246.7741 0.013257936
## directional_mk.gamma -113.1218    6 238.2437 0.943698645
## ordered_mk           -114.9672   10 249.9344 0.002730438
## ordered_mk.gamma     -111.2750   11 244.5499 0.040312982

When we do, we see that the best-supported, by quite a wide margin, is the loss-only model with heterogeneous rates among edges. Indeed, this model has a better log-likelihood than the much more parameter-rich ordered Mk model.

Finally, let’s compute the marginal scaled likelihoods of each rate on each edge. To do that I don’t need to re-optimize my model – I can simply feed fitgammaMk our fitted Q matrix and \(\alpha\), and then set marginal=TRUE as follows.

marginal.ordered_mk.gamma<-fitgammaMk(
  squamate.tree,toe_number,
  fixedQ=as.Qmatrix(directional_mk.gamma),
  alpha.init=directional_mk.gamma$alpha,
  marginal=TRUE)
##   --
##   Computing marginal scaled likelihoods (in parallel) of each
##   rate on each edge. Caution: this is NOT fast....
##   --
plot(marginal.ordered_mk.gamma,
  colors=c("gray19","red","yellow","ivory2"),lwd=3)

plot of chunk unnamed-chunk-26

That’s it. Cool.

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.