Phylogenetic Tools for Comparative Biology: Minor improvements to fitPagel for fitting a correlated binary trait evolution model in phytools

Earlier today, a colleague contacted me about interpreting a result from phytools::fitPagel (to fit the Pagel ’94 correlated binary trait evolution model).

I suggested she check for convergence to the ML solution for each model, but then was dismayed to realize that I hadn’t programmed fitPagel in such a way where it was at all straightforward to do so! Indeed, even though fitPagel uses fitMk (or, optionally, geiger::fitDiscrete) internally, it strips out only the parts of the "fitMk" object I deemed important at the time (mainly, the Q matrix and log-likelihood).

Since this function gets quite a lot of use, I thought it should minimally return the fitted "fitMk" class model objects and a message indicating whether or not R thought converge had been achieved – so now it does precisely that. To get this feature, users must simply update phytools from GitHub.

To demonstrate, I’m going to use a simple dataset – the same one that I used in my book with Luke Harmon – for spawning mode and paternal care of bony fish based on Benun Sutton & Wilson (2019).

We can start by loading phytools & checking the package version number.

## load package
library(phytools)

## check version number
packageVersion("phytools")

## [1] '2.4.6'

Next, let’s load our data as follows.

## load data
data(bonyfish.tree)
data(bonyfish.data)
## extract discrete characters
spawning_mode<-setNames(bonyfish.data$spawning_mode,
  rownames(bonyfish.data))
paternal_care<-setNames(bonyfish.data$paternal_care,
  rownames(bonyfish.data))

Now we’re ready to fit our models using fitPagel. The function fits both an independent model as well as the trait-dependent Pagel ’94 model.

## fit correlational model
bonyfish.pagel<-fitPagel(bonyfish.tree,paternal_care,
  spawning_mode,rand_start=TRUE)
bonyfish.pagel

## 
## Pagel's binary character correlation test:
## 
## Assumes "ARD" substitution model for both characters
## 
## Independent model rate matrix:
##            male|group male|pair none|group none|pair
## male|group  -0.003030  0.003030   0.000000  0.000000
## male|pair    0.001935 -0.001935   0.000000  0.000000
## none|group   0.002229  0.000000  -0.005259  0.003030
## none|pair    0.000000  0.002229   0.001935 -0.004164
## 
## Dependent (x & y) model rate matrix:
##            male|group male|pair none|group none|pair
## male|group   -0.02674  0.006940   0.019800  0.000000
## male|pair     0.00000  0.000000   0.000000  0.000000
## none|group    0.00000  0.000000  -0.004150  0.004150
## none|pair     0.00000  0.003123   0.002899 -0.006022
## 
## Model fit:
##             log-likelihood      AIC
## independent      -62.00739 132.0148
## dependent        -55.50975 127.0195
## 
## Hypothesis test result:
##   likelihood-ratio:  12.9953 
##   p-value:  0.0112989 
## 
## Model fitting method used was fitMk 
## 
## R thinks one or both likelihood optimizations did not converge.

According to the model print out, it appears that one or the other of our independent or dependent model optimizations may not have converged to the ML solution.

To find out which, we need to inspect the fitted models which are now (but were not before) returned as the model object element mk_fits.

bonyfish.pagel$mk_fits

## $independent
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##            male|group male|pair none|group none|pair
## male|group  -0.003030  0.003030   0.000000  0.000000
## male|pair    0.001935 -0.001935   0.000000  0.000000
## none|group   0.002229  0.000000  -0.005259  0.003030
## none|pair    0.000000  0.002229   0.001935 -0.004164
## 
## Fitted (or set) value of pi:
## male|group  male|pair none|group  none|pair 
##       0.25       0.25       0.25       0.25 
## due to treating the root prior as (a) flat.
## 
## Log-likelihood: -62.007389 
## 
## Optimization method used was "nlminb"
## 
## R thinks it has found the ML solution.
## 
## 
## $dependent
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##            male|group male|pair none|group none|pair
## male|group   -0.02674  0.006940   0.019800  0.000000
## male|pair     0.00000  0.000000   0.000000  0.000000
## none|group    0.00000  0.000000  -0.004150  0.004150
## none|pair     0.00000  0.003123   0.002899 -0.006022
## 
## Fitted (or set) value of pi:
## male|group  male|pair none|group  none|pair 
##       0.25       0.25       0.25       0.25 
## due to treating the root prior as (a) flat.
## 
## Log-likelihood: -55.509755 
## 
## Optimization method used was "nlminb"
## 
## R thinks optimization may not have converged.

This reveals that R thinks that the dependent model did not converge.

We can fit both models using different random starting values again as follows.

bonyfish.pagel<-fitPagel(bonyfish.tree,paternal_care,
  spawning_mode,rand_start=TRUE)
bonyfish.pagel

## 
## Pagel's binary character correlation test:
## 
## Assumes "ARD" substitution model for both characters
## 
## Independent model rate matrix:
##            male|group male|pair none|group none|pair
## male|group  -0.003030  0.003030   0.000000  0.000000
## male|pair    0.001935 -0.001935   0.000000  0.000000
## none|group   0.002229  0.000000  -0.005259  0.003030
## none|pair    0.000000  0.002229   0.001935 -0.004164
## 
## Dependent (x & y) model rate matrix:
##            male|group male|pair none|group none|pair
## male|group  -0.287481  0.105817   0.181664  0.000000
## male|pair    0.000000  0.000000   0.000000  0.000000
## none|group   0.000000  0.000000  -0.004022  0.004022
## none|pair    0.000000  0.003114   0.002871 -0.005985
## 
## Model fit:
##             log-likelihood      AIC
## independent      -62.00739 132.0148
## dependent        -55.35818 126.7164
## 
## Hypothesis test result:
##   likelihood-ratio:  13.2984 
##   p-value:  0.00990611 
## 
## Model fitting method used was fitMk 
## 
## R thinks both model optimizations converged.

This tells us that R thinks the model has converged.

Now, if we could not get fitPagel to optimize, perhaps we’d like to re-run our model in fitMk directly. By exporting the fitted model objects, we’ve made that much easier. That’s because our object now contains both the data (as state combinations, rather than the original traits) and a model design matrix, as well as our tree.

Let’s see.

refit_dependent<-fitMk(
  tree=bonyfish.pagel$tree,
  x=bonyfish.pagel$mk_fits[[2]]$data,
  model=bonyfish.pagel$mk_fits[[2]]$index.matrix)
refit_dependent

## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##            male|group male|pair none|group none|pair
## male|group  -0.288362  0.106083   0.182279  0.000000
## male|pair    0.000000  0.000000   0.000000  0.000000
## none|group   0.000000  0.000000  -0.004022  0.004022
## none|pair    0.000000  0.003114   0.002871 -0.005985
## 
## Fitted (or set) value of pi:
## male|group  male|pair none|group  none|pair 
##       0.25       0.25       0.25       0.25 
## due to treating the root prior as (a) flat.
## 
## Log-likelihood: -55.358184 
## 
## Optimization method used was "nlminb"
## 
## R thinks optimization may not have converged.

That worked. We could’ve likewise run multiple optimization iterations, used different routines, or assumed a different root prior.

Lastly, let’s plot our fitted model.

## plot fitted models
plot(bonyfish.pagel,lwd.by.rate=TRUE)

plot of chunk unnamed-chunk-41

Phylogenetic Tools for Comparative Biology

Pages

Monday, January 27, 2025

Minor improvements to `fitPagel` for fitting a correlated binary trait evolution model in phytools

No comments:

Post a Comment