Wednesday, August 5, 2020

Fixing the root state in an analysis using fitPagel

A little while ago, a phytools user contacted me to ask if there was a way to fix (i.e., 'fossilize') the state at the global root of the tree for an analysis of correlated binary character evolution using the fitPagel function in R.

This is pretty easy to do in the function fitMk using the argument pi which is for supplying a prior distribution on the root state.

Unfortunately, although fitMk is used internally by fitPagel for model-fitting, fitPagel does not pass fitMk the value of pi. Even if it did, we would still have to recode our fossilized prior belief as a combination of character states (e.g., 0+0, 0+1, and so on) to get it to work correctly with fitPagel.

Fortunately, I was able to provide an alternative solution as follows:

“Sure. There is an easy way to do it that doesn't require passing any internal arguments to the model-fitting function, or anything. This is by attaching a tip of zero length to the root of the tree (this can be done using bind.tip) and then giving that tip the desired state combination.”

Let's see how that works.

To demonstrate it, I will use some data on the absence/presence of tail-spines in squamate reptiles from Ramm et al. (2020).

Ramm et al. used fitPagel to do their analyses, but their dataset is quite large (nearly 3,000 taxa).

Just to get my computation to run a bit faster I'm going to subsample their data to include only 600 taxa. (Their data is available via Dryad here.)

Ramm et al. did various different analyses using fitPagel. I will repeat just one of these on our reduced dataset - and that is the test for correlated evolution between rocky habitat use (coded 'terrestrial-saxicolous' by the authors) and the presence of tail spines.

Since our hypothesis (I imagine) is that rocky habitats favor the evolution of tail spines (and not the reverse), let's use the argument dep.var to fit a model in which the rates of evolution & loss of tail spines depends on habitat, but the rate of change in habitat does not depend on the state for tail spines. (We could imagine evolutionary scenarios in which it did, but this model also has fewer parameters so it will be easier to fit.)

library(phytools)
library(geiger)

First of all, here are our data (from Dryad, but with some pre-processing):

tree
## 
## Phylogenetic tree with 600 tips and 599 internal nodes.
## 
## Tip labels:
##  Dibamus_montanus, Nephrurus_amyae, Lialis_jicari, Aprasia_repens, Aprasia_parapulchella, Dierogekko_inexpectatus, ...
## 
## Rooted; includes branch lengths.
head(rocky)
## Acanthocercus_atricollis           Agama_aculeata              Agama_agama          Agama_anchietae 
##                non-rocky                non-rocky                non-rocky                non-rocky 
##             Agama_armata               Agama_atra 
##                non-rocky                non-rocky 
## Levels: non-rocky rocky
head(spiny)
## Acanthocercus_atricollis           Agama_aculeata              Agama_agama          Agama_anchietae 
##                    spiny                    spiny                    spiny                    spiny 
##             Agama_armata               Agama_atra 
##                    spiny                    spiny 
## Levels: non-spiny spiny

Now, let's proceed to fit our standard model using fitPagel in which we put a flat prior on the root state. (That is, we say a priori that any combination of our two character states is equally likely.)

fit.free<-fitPagel(tree,rocky,spiny,dep.var="y")
fit.free
## 
## Pagel's binary character correlation test:
## 
## Assumes "ARD" substitution model for both characters
## 
## Independent model rate matrix:
##                     non-rocky|non-spiny non-rocky|spiny rocky|non-spiny   rocky|spiny
## non-rocky|non-spiny        -0.003571491    0.0009301995     0.002641292  0.0000000000
## non-rocky|spiny             0.004294627   -0.0069359188     0.000000000  0.0026412917
## rocky|non-spiny             0.022466096    0.0000000000    -0.023396296  0.0009301995
## rocky|spiny                 0.000000000    0.0224660961     0.004294627 -0.0267607232
## 
## Dependent (y only) model rate matrix:
##                     non-rocky|non-spiny non-rocky|spiny rocky|non-spiny  rocky|spiny
## non-rocky|non-spiny        -0.003185682      0.00000000     0.003185682  0.000000000
## non-rocky|spiny             0.012075928     -0.01526161     0.000000000  0.003185682
## rocky|non-spiny             0.024347117      0.00000000    -0.036868700  0.012521583
## rocky|spiny                 0.000000000      0.02434712     0.000000000 -0.024347117
## 
## Model fit:
##             log-likelihood      AIC
## independent      -367.7976 743.5952
## dependent        -348.6054 709.2108
## 
## Hypothesis test result:
##   likelihood-ratio:  38.38443 
##   p-value:  4.62303e-09 
## 
## Model fitting method used was fitMk
plot(fit.free)

plot of chunk unnamed-chunk-4

Next, let's imagine that we want to fossilize the state at the root to be non-rocky (for habitat) and non-spiny for the morphological trait.

As I said before, we're going to do this by adding a tip of zero-length to the root using phytools::bind.tip and then assigning it those states.

I'm going to call this new tip "ROOT" - but that's arbitrary. We can call it anything we want, as long as it doesn't have the same name as any other tip in our tree.

Let's do it.

First, add the extra tip:

tree<-bind.tip(tree,tip.label="ROOT",edge.length=0,
    where=Ntip(tree)+1)

Here's what that looks like:

plotTree(tree,ftype="off",lwd=1)
nn<-which(tree$tip.label=="ROOT")
tiplabels(tree$tip.label[nn],nn,adj=c(-0.1,0.5),
    frame="none",cex=0.8,font=3)

plot of chunk unnamed-chunk-6

OK, now let's add our desired (fossilized) root states to our data vectors:

spiny<-as.factor(c(setNames("non-spiny","ROOT"),
    setNames(as.character(spiny),names(spiny))))
rocky<-as.factor(c(setNames("non-rocky","ROOT"),
    setNames(as.character(rocky),names(rocky))))

(OK, this was probably unnecessarily complicated - but we could have also avoided either by encoding our data as a character or 0/1 numeric vector - both of which are permitted by fitPagel.)

Finally, let's fit our fixed-root model as follows:

fit.fixed<-fitPagel(tree,rocky,spiny,dep.var="y")
fit.fixed
## 
## Pagel's binary character correlation test:
## 
## Assumes "ARD" substitution model for both characters
## 
## Independent model rate matrix:
##                     non-rocky|non-spiny non-rocky|spiny rocky|non-spiny   rocky|spiny
## non-rocky|non-spiny        -0.004201826    0.0009405981     0.003261228  0.0000000000
## non-rocky|spiny             0.004242876   -0.0075041043     0.000000000  0.0032612280
## rocky|non-spiny             0.022515550    0.0000000000    -0.023456148  0.0009405981
## rocky|spiny                 0.000000000    0.0225155501     0.004242876 -0.0267584264
## 
## Dependent (y only) model rate matrix:
##                     non-rocky|non-spiny non-rocky|spiny rocky|non-spiny rocky|spiny
## non-rocky|non-spiny        -0.003201910      0.00000000      0.00320191  0.00000000
## non-rocky|spiny             0.009889029     -0.01309094      0.00000000  0.00320191
## rocky|non-spiny             0.024393432      0.00000000     -0.04373400  0.01934057
## rocky|spiny                 0.000000000      0.02439343      0.00000000 -0.02439343
## 
## Model fit:
##             log-likelihood      AIC
## independent      -369.5143 747.0286
## dependent        -349.2664 710.5328
## 
## Hypothesis test result:
##   likelihood-ratio:  40.49581 
##   p-value:  1.608598e-09 
## 
## Model fitting method used was fitMk
plot(fit.fixed)

plot of chunk unnamed-chunk-8

In this case, fixing the root state did not have a large effect on our result - but it did have an effect, which I think is pretty interesting given that this is for a reasonably large dataset. I imagine if our data about the evolutionary process was scarcer the effect of fixing the root state (hopefully to a correct known value) would be larger still.

That's it!

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.