A little while ago, a *phytools* user contacted me to ask if there was a way
to fix (i.e., 'fossilize') the state at the global root of the tree for an
analysis of correlated binary character evolution using the `fitPagel`

function in R.

This is pretty easy to do in the function `fitMk`

using the argument `pi`

which is for supplying a prior distribution on the root state.

Unfortunately, although `fitMk`

is used internally by `fitPagel`

for
model-fitting, `fitPagel`

does not pass `fitMk`

the value of `pi`

. Even
if it did, we would still have to recode our fossilized prior belief as a
combination of character states (e.g., `0+0`

, `0+1`

, and so on) to get it to
work correctly with `fitPagel`

.

Fortunately, I was able to provide an alternative solution as follows:

*“Sure. There is an easy way to do it that doesn't require passing any
internal arguments to the model-fitting function, or anything. This is by
attaching a tip of zero length to the root of the tree (this can be done
using bind.tip) and then giving that tip the desired state combination.”*

Let's see how that works.

To demonstrate it, I will use some data on the absence/presence of tail-spines in squamate reptiles from Ramm et al. (2020).

Ramm et al. used `fitPagel`

to do their analyses, but their dataset is quite
large (nearly 3,000 taxa).

Just to get my computation to run a bit faster I'm going to subsample their data to include only 600 taxa. (Their data is available via Dryad here.)

Ramm et al. did various different analyses using `fitPagel`

. I will repeat
just one of these on our reduced dataset - and that is the test for
correlated evolution between rocky habitat use (coded 'terrestrial-saxicolous'
by the authors) and the presence of tail spines.

Since our hypothesis (I imagine) is that rocky habitats favor the evolution
of tail spines (and not the reverse), let's use the argument `dep.var`

to fit
a model in which the rates of evolution & loss of tail spines depends on
habitat, but the rate of change in habitat *does not* depend on the state
for tail spines. (We could imagine evolutionary scenarios in which it did,
but this model also has fewer parameters so it will be easier to fit.)

```
library(phytools)
library(geiger)
```

First of all, here are our data (from Dryad, but with some pre-processing):

```
tree
```

```
##
## Phylogenetic tree with 600 tips and 599 internal nodes.
##
## Tip labels:
## Dibamus_montanus, Nephrurus_amyae, Lialis_jicari, Aprasia_repens, Aprasia_parapulchella, Dierogekko_inexpectatus, ...
##
## Rooted; includes branch lengths.
```

```
head(rocky)
```

```
## Acanthocercus_atricollis Agama_aculeata Agama_agama Agama_anchietae
## non-rocky non-rocky non-rocky non-rocky
## Agama_armata Agama_atra
## non-rocky non-rocky
## Levels: non-rocky rocky
```

```
head(spiny)
```

```
## Acanthocercus_atricollis Agama_aculeata Agama_agama Agama_anchietae
## spiny spiny spiny spiny
## Agama_armata Agama_atra
## spiny spiny
## Levels: non-spiny spiny
```

Now, let's proceed to fit our standard model using `fitPagel`

in which we
put a flat prior on the root state. (That is, we say *a priori* that any
combination of our two character states is equally likely.)

```
fit.free<-fitPagel(tree,rocky,spiny,dep.var="y")
fit.free
```

```
##
## Pagel's binary character correlation test:
##
## Assumes "ARD" substitution model for both characters
##
## Independent model rate matrix:
## non-rocky|non-spiny non-rocky|spiny rocky|non-spiny rocky|spiny
## non-rocky|non-spiny -0.003571491 0.0009301995 0.002641292 0.0000000000
## non-rocky|spiny 0.004294627 -0.0069359188 0.000000000 0.0026412917
## rocky|non-spiny 0.022466096 0.0000000000 -0.023396296 0.0009301995
## rocky|spiny 0.000000000 0.0224660961 0.004294627 -0.0267607232
##
## Dependent (y only) model rate matrix:
## non-rocky|non-spiny non-rocky|spiny rocky|non-spiny rocky|spiny
## non-rocky|non-spiny -0.003185682 0.00000000 0.003185682 0.000000000
## non-rocky|spiny 0.012075928 -0.01526161 0.000000000 0.003185682
## rocky|non-spiny 0.024347117 0.00000000 -0.036868700 0.012521583
## rocky|spiny 0.000000000 0.02434712 0.000000000 -0.024347117
##
## Model fit:
## log-likelihood AIC
## independent -367.7976 743.5952
## dependent -348.6054 709.2108
##
## Hypothesis test result:
## likelihood-ratio: 38.38443
## p-value: 4.62303e-09
##
## Model fitting method used was fitMk
```

```
plot(fit.free)
```

Next, let's imagine that we want to fossilize the state at the root to be
`non-rocky`

(for habitat) and `non-spiny`

for the morphological trait.

As I said before, we're going to do this by adding a tip of zero-length to the
root using `phytools::bind.tip`

and then assigning it those states.

I'm going to call this new tip `"ROOT"`

- but that's arbitrary. We can call it
anything we want, as long as it doesn't have the same name as any other tip
in our tree.

Let's do it.

First, add the extra tip:

```
tree<-bind.tip(tree,tip.label="ROOT",edge.length=0,
where=Ntip(tree)+1)
```

Here's what that looks like:

```
plotTree(tree,ftype="off",lwd=1)
nn<-which(tree$tip.label=="ROOT")
tiplabels(tree$tip.label[nn],nn,adj=c(-0.1,0.5),
frame="none",cex=0.8,font=3)
```

OK, now let's add our desired (fossilized) root states to our data vectors:

```
spiny<-as.factor(c(setNames("non-spiny","ROOT"),
setNames(as.character(spiny),names(spiny))))
rocky<-as.factor(c(setNames("non-rocky","ROOT"),
setNames(as.character(rocky),names(rocky))))
```

(OK, this was probably unnecessarily complicated - but we could have also
avoided either by encoding our data as a character or 0/1 numeric vector -
both of which are permitted by `fitPagel`

.)

Finally, let's fit our fixed-root model as follows:

```
fit.fixed<-fitPagel(tree,rocky,spiny,dep.var="y")
fit.fixed
```

```
##
## Pagel's binary character correlation test:
##
## Assumes "ARD" substitution model for both characters
##
## Independent model rate matrix:
## non-rocky|non-spiny non-rocky|spiny rocky|non-spiny rocky|spiny
## non-rocky|non-spiny -0.004201826 0.0009405981 0.003261228 0.0000000000
## non-rocky|spiny 0.004242876 -0.0075041043 0.000000000 0.0032612280
## rocky|non-spiny 0.022515550 0.0000000000 -0.023456148 0.0009405981
## rocky|spiny 0.000000000 0.0225155501 0.004242876 -0.0267584264
##
## Dependent (y only) model rate matrix:
## non-rocky|non-spiny non-rocky|spiny rocky|non-spiny rocky|spiny
## non-rocky|non-spiny -0.003201910 0.00000000 0.00320191 0.00000000
## non-rocky|spiny 0.009889029 -0.01309094 0.00000000 0.00320191
## rocky|non-spiny 0.024393432 0.00000000 -0.04373400 0.01934057
## rocky|spiny 0.000000000 0.02439343 0.00000000 -0.02439343
##
## Model fit:
## log-likelihood AIC
## independent -369.5143 747.0286
## dependent -349.2664 710.5328
##
## Hypothesis test result:
## likelihood-ratio: 40.49581
## p-value: 1.608598e-09
##
## Model fitting method used was fitMk
```

```
plot(fit.fixed)
```

In this case, fixing the root state did not have a large effect on our
result - but it did have *an* effect, which I think is pretty interesting
given that this is for a reasonably large dataset. I imagine if our data
about the evolutionary process was scarcer the effect of fixing the root
state (hopefully to a correct known value) would be larger still.

That's it!

## No comments:

## Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.