A little while ago, a phytools user contacted me to ask if there was a way
to fix (i.e., 'fossilize') the state at the global root of the tree for an
analysis of correlated binary character evolution using the fitPagel
function in R.
This is pretty easy to do in the function fitMk
using the argument pi
which is for supplying a prior distribution on the root state.
Unfortunately, although fitMk
is used internally by fitPagel
for
model-fitting, fitPagel
does not pass fitMk
the value of pi
. Even
if it did, we would still have to recode our fossilized prior belief as a
combination of character states (e.g., 0+0
, 0+1
, and so on) to get it to
work correctly with fitPagel
.
Fortunately, I was able to provide an alternative solution as follows:
“Sure. There is an easy way to do it that doesn't require passing any
internal arguments to the model-fitting function, or anything. This is by
attaching a tip of zero length to the root of the tree (this can be done
using bind.tip
) and then giving that tip the desired state combination.”
Let's see how that works.
To demonstrate it, I will use some data on the absence/presence of tail-spines in squamate reptiles from Ramm et al. (2020).
Ramm et al. used fitPagel
to do their analyses, but their dataset is quite
large (nearly 3,000 taxa).
Just to get my computation to run a bit faster I'm going to subsample their data to include only 600 taxa. (Their data is available via Dryad here.)
Ramm et al. did various different analyses using fitPagel
. I will repeat
just one of these on our reduced dataset - and that is the test for
correlated evolution between rocky habitat use (coded 'terrestrial-saxicolous'
by the authors) and the presence of tail spines.
Since our hypothesis (I imagine) is that rocky habitats favor the evolution
of tail spines (and not the reverse), let's use the argument dep.var
to fit
a model in which the rates of evolution & loss of tail spines depends on
habitat, but the rate of change in habitat does not depend on the state
for tail spines. (We could imagine evolutionary scenarios in which it did,
but this model also has fewer parameters so it will be easier to fit.)
library(phytools)
library(geiger)
First of all, here are our data (from Dryad, but with some pre-processing):
tree
##
## Phylogenetic tree with 600 tips and 599 internal nodes.
##
## Tip labels:
## Dibamus_montanus, Nephrurus_amyae, Lialis_jicari, Aprasia_repens, Aprasia_parapulchella, Dierogekko_inexpectatus, ...
##
## Rooted; includes branch lengths.
head(rocky)
## Acanthocercus_atricollis Agama_aculeata Agama_agama Agama_anchietae
## non-rocky non-rocky non-rocky non-rocky
## Agama_armata Agama_atra
## non-rocky non-rocky
## Levels: non-rocky rocky
head(spiny)
## Acanthocercus_atricollis Agama_aculeata Agama_agama Agama_anchietae
## spiny spiny spiny spiny
## Agama_armata Agama_atra
## spiny spiny
## Levels: non-spiny spiny
Now, let's proceed to fit our standard model using fitPagel
in which we
put a flat prior on the root state. (That is, we say a priori that any
combination of our two character states is equally likely.)
fit.free<-fitPagel(tree,rocky,spiny,dep.var="y")
fit.free
##
## Pagel's binary character correlation test:
##
## Assumes "ARD" substitution model for both characters
##
## Independent model rate matrix:
## non-rocky|non-spiny non-rocky|spiny rocky|non-spiny rocky|spiny
## non-rocky|non-spiny -0.003571491 0.0009301995 0.002641292 0.0000000000
## non-rocky|spiny 0.004294627 -0.0069359188 0.000000000 0.0026412917
## rocky|non-spiny 0.022466096 0.0000000000 -0.023396296 0.0009301995
## rocky|spiny 0.000000000 0.0224660961 0.004294627 -0.0267607232
##
## Dependent (y only) model rate matrix:
## non-rocky|non-spiny non-rocky|spiny rocky|non-spiny rocky|spiny
## non-rocky|non-spiny -0.003185682 0.00000000 0.003185682 0.000000000
## non-rocky|spiny 0.012075928 -0.01526161 0.000000000 0.003185682
## rocky|non-spiny 0.024347117 0.00000000 -0.036868700 0.012521583
## rocky|spiny 0.000000000 0.02434712 0.000000000 -0.024347117
##
## Model fit:
## log-likelihood AIC
## independent -367.7976 743.5952
## dependent -348.6054 709.2108
##
## Hypothesis test result:
## likelihood-ratio: 38.38443
## p-value: 4.62303e-09
##
## Model fitting method used was fitMk
plot(fit.free)
Next, let's imagine that we want to fossilize the state at the root to be
non-rocky
(for habitat) and non-spiny
for the morphological trait.
As I said before, we're going to do this by adding a tip of zero-length to the
root using phytools::bind.tip
and then assigning it those states.
I'm going to call this new tip "ROOT"
- but that's arbitrary. We can call it
anything we want, as long as it doesn't have the same name as any other tip
in our tree.
Let's do it.
First, add the extra tip:
tree<-bind.tip(tree,tip.label="ROOT",edge.length=0,
where=Ntip(tree)+1)
Here's what that looks like:
plotTree(tree,ftype="off",lwd=1)
nn<-which(tree$tip.label=="ROOT")
tiplabels(tree$tip.label[nn],nn,adj=c(-0.1,0.5),
frame="none",cex=0.8,font=3)
OK, now let's add our desired (fossilized) root states to our data vectors:
spiny<-as.factor(c(setNames("non-spiny","ROOT"),
setNames(as.character(spiny),names(spiny))))
rocky<-as.factor(c(setNames("non-rocky","ROOT"),
setNames(as.character(rocky),names(rocky))))
(OK, this was probably unnecessarily complicated - but we could have also
avoided either by encoding our data as a character or 0/1 numeric vector -
both of which are permitted by fitPagel
.)
Finally, let's fit our fixed-root model as follows:
fit.fixed<-fitPagel(tree,rocky,spiny,dep.var="y")
fit.fixed
##
## Pagel's binary character correlation test:
##
## Assumes "ARD" substitution model for both characters
##
## Independent model rate matrix:
## non-rocky|non-spiny non-rocky|spiny rocky|non-spiny rocky|spiny
## non-rocky|non-spiny -0.004201826 0.0009405981 0.003261228 0.0000000000
## non-rocky|spiny 0.004242876 -0.0075041043 0.000000000 0.0032612280
## rocky|non-spiny 0.022515550 0.0000000000 -0.023456148 0.0009405981
## rocky|spiny 0.000000000 0.0225155501 0.004242876 -0.0267584264
##
## Dependent (y only) model rate matrix:
## non-rocky|non-spiny non-rocky|spiny rocky|non-spiny rocky|spiny
## non-rocky|non-spiny -0.003201910 0.00000000 0.00320191 0.00000000
## non-rocky|spiny 0.009889029 -0.01309094 0.00000000 0.00320191
## rocky|non-spiny 0.024393432 0.00000000 -0.04373400 0.01934057
## rocky|spiny 0.000000000 0.02439343 0.00000000 -0.02439343
##
## Model fit:
## log-likelihood AIC
## independent -369.5143 747.0286
## dependent -349.2664 710.5328
##
## Hypothesis test result:
## likelihood-ratio: 40.49581
## p-value: 1.608598e-09
##
## Model fitting method used was fitMk
plot(fit.fixed)
In this case, fixing the root state did not have a large effect on our result - but it did have an effect, which I think is pretty interesting given that this is for a reasonably large dataset. I imagine if our data about the evolutionary process was scarcer the effect of fixing the root state (hopefully to a correct known value) would be larger still.
That's it!
No comments:
Post a Comment
Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.