I just updated the function fitPagel
with several new features
and models.
fitPagel
fits the Pagel
(1994) model in which
the rate of evolution for one discrete character can be affected by a second,
and vice versa.
The extensions in this version are as follows.
(1) I added the argument dep.var
which can have values
"xy"
, "x"
, or "y"
and permits
(respectively) the substitution rate of x
to depend on
the state of y
and vice versa, the rate of x
(only) to depend on the state of y
, and, finally, the rate of
y
(only) to dependent on the state of x
.
(2) I also added the argument model
which can have values
"ER"
, "SYM"
, or "ARD"
. This just sets
the substitution model for the individual characters, regardless of whether
or not the rates depend on a second state. So, for instance, if
dep.var="x"
and model="ER"
, then x
will have two rates, depending on the state of y
, but these
two rates will be equal in a forward & backward direction.
Note that model="ER"
may not correspond with what we often think
of as a correlated evolution model, because under a correlated evolution
model we might think that if trait 1 acquires state "b"
, for
instance, than trait 2 should also acquire state "b"
- but under
an equal rates model "b|a" -> "b|b"
may occur at a higher rate
than "a|a" -> "a|b"
, but that high rate is forced to be equal
to the rate of leaving the correlated states, "b|b" -> "b|a"
.
Here is a demo of how using these new options might look:
library(phytools)
packageVersion("phytools") ## from GitHub
## [1] '0.5.32'
## first, here is our data & tree
colors<-setNames(palette()[1:4],sort(unique(c(x,y))))
dotTree(tree,cbind(x,y),data.type="discrete",fsize=0.7,
colors=colors,pts=FALSE)
Now let's fit some models:
fit.xy<-fitPagel(tree,x,y)
fit.x<-fitPagel(tree,x,y,dep.var="x")
fit.y<-fitPagel(tree,x,y,dep.var="y")
fit.xy
##
## Pagel's binary character correlation test:
##
## Assumes "ARD" substitution model for both characters
##
## Independent model rate matrix:
## a|c a|d b|c b|d
## a|c -3.7275431 2.6817189 1.045824 0.000000
## a|d 1.2367301 -2.2825542 0.000000 1.045824
## b|c 0.5811491 0.0000000 -3.262868 2.681719
## b|d 0.0000000 0.5811491 1.236730 -1.817879
##
## Dependent (x & y) model rate matrix:
## a|c a|d b|c b|d
## a|c -5.5939241 5.5939241 0.000000 0.000000
## a|d 0.4590907 -1.8971976 0.000000 1.438107
## b|c 0.6371031 0.0000000 -2.886065 2.248962
## b|d 0.0000000 0.4892436 1.823519 -2.312762
##
## Model fit:
## log-likelihood AIC
## independent -53.76123 115.5225
## dependent -52.44912 120.8982
##
## Hypothesis test result:
## likelihood-ratio: 2.624216
## p-value: 0.6225394
##
## Model fitting method used was fitMk
fit.x
##
## Pagel's binary character correlation test:
##
## Assumes "ARD" substitution model for both characters
##
## Independent model rate matrix:
## a|c a|d b|c b|d
## a|c -3.7275431 2.6817189 1.045824 0.000000
## a|d 1.2367301 -2.2825542 0.000000 1.045824
## b|c 0.5811491 0.0000000 -3.262868 2.681719
## b|d 0.0000000 0.5811491 1.236730 -1.817879
##
## Dependent (x only) model rate matrix:
## a|c a|d b|c b|d
## a|c -5.8351846 2.6828060 3.152379 0.0000000
## a|d 1.2044747 -1.9168270 0.000000 0.7123523
## b|c 0.0975676 0.0000000 -2.780374 2.6828060
## b|d 0.0000000 0.8518788 1.204475 -2.0563535
##
## Model fit:
## log-likelihood AIC
## independent -53.76123 115.5225
## dependent -52.86809 117.7362
##
## Hypothesis test result:
## likelihood-ratio: 1.786292
## p-value: 0.4093659
##
## Model fitting method used was fitMk
fit.y
##
## Pagel's binary character correlation test:
##
## Assumes "ARD" substitution model for both characters
##
## Independent model rate matrix:
## a|c a|d b|c b|d
## a|c -3.7275431 2.6817189 1.045824 0.000000
## a|d 1.2367301 -2.2825542 0.000000 1.045824
## b|c 0.5811491 0.0000000 -3.262868 2.681719
## b|d 0.0000000 0.5811491 1.236730 -1.817879
##
## Dependent (y only) model rate matrix:
## a|c a|d b|c b|d
## a|c -5.7733459 4.5972311 1.176115 0.000000
## a|d 0.5382396 -1.7143544 0.000000 1.176115
## b|c 0.5655022 0.0000000 -3.120088 2.554586
## b|d 0.0000000 0.5655022 1.780828 -2.346330
##
## Model fit:
## log-likelihood AIC
## independent -53.76123 115.5225
## dependent -52.54412 117.0882
##
## Hypothesis test result:
## likelihood-ratio: 2.434218
## p-value: 0.2960849
##
## Model fitting method used was fitMk
There was no correlation simulated in these data - consequently it should be unsurprising (and perhaps reassuring!) that the best fit model is the simplest, independent model.
Since fitPagel
is just a wrapper function for other model-fitting
functions, we can change the optimizer easily. One good option is
fitDiscrete
in the geiger package:
fit.geiger<-fitPagel(tree,x,y,method="fitDiscrete")
## Warning in fitDiscrete(tree, xy, model = iQ, ...): Parameter estimates appear at bounds:
## q14
## q23
## q32
## q41
## Warning in fitDiscrete(tree, xy, model = dQ, ...): Parameter estimates appear at bounds:
## q14
## q23
## q32
## q41
fit.geiger
##
## Pagel's binary character correlation test:
##
## Assumes "ARD" substitution model for both characters
##
## Independent model rate matrix:
## a|c a|d b|c b|d
## a|c -3.2101072 2.6611828 0.5489243 0.0000000
## a|d 1.2335297 -1.7824540 0.0000000 0.5489243
## b|c 0.5872987 0.0000000 -3.2484816 2.6611828
## b|d 0.0000000 0.5872987 1.2335297 -1.8208284
##
## Dependent (x & y) model rate matrix:
## a|c a|d b|c b|d
## a|c -3.7711178 3.7711178 1.191744e-16 0.0000000
## a|d 0.5702951 -1.2146389 0.000000e+00 0.6443438
## b|c 0.2837940 0.0000000 -3.256106e+00 2.9723119
## b|d 0.0000000 0.7770299 1.739702e+00 -2.5167321
##
## Model fit:
## log-likelihood AIC
## independent -53.27732 114.5546
## dependent -51.99965 119.9993
##
## Hypothesis test result:
## likelihood-ratio: 2.555352
## p-value: 0.6347523
##
## Model fitting method used was fitDiscrete
Note that although the parameter estimates & likelihoods should be quite
similar, they won't be precisely the same because fitDiscrete
makes
different
default root node prior. The likelihoods should also not be compared.
Finally, we can try the "ER"
model as follows, noting the
caveat that I outline above:
fit.ER<-fitPagel(tree,x,y,model="ER")
fit.ER
##
## Pagel's binary character correlation test:
##
## Assumes "ER" substitution model for both characters
##
## Independent model rate matrix:
## a|c a|d b|c b|d
## a|c -1.5175113 0.9151581 0.6023532 0.0000000
## a|d 0.9151581 -1.5175113 0.0000000 0.6023532
## b|c 0.6023532 0.0000000 -1.5175113 0.9151581
## b|d 0.0000000 0.6023532 0.9151581 -1.5175113
##
## Dependent (x & y) model rate matrix:
## a|c a|d b|c b|d
## a|c -0.4990655 0.3757513 0.1233143 0.0000000
## a|d 0.3757513 -1.2105813 0.0000000 0.8348301
## b|c 0.1233143 0.0000000 -1.4990703 1.3757560
## b|d 0.0000000 0.8348301 1.3757560 -2.2105861
##
## Model fit:
## log-likelihood AIC
## independent -54.94916 113.8983
## dependent -53.66275 115.3255
##
## Hypothesis test result:
## likelihood-ratio: 2.572823
## p-value: 0.2762603
##
## Model fitting method used was fitMk
Unfortunately, to run dotTree
as I have above you may have to
roll-back the internally used package, plotrix. For some reason I've yet to
figure out, the latest plotrix version breaks dotTree
(and in
quite a spectacular way). Hopefully I will get to the bottom of it soon.
## here is how to install an archived version of plotrix
install.packages(
"https://cran.r-project.org/src/contrib/Archive/plotrix/plotrix_3.6-1.tar.gz",
type="source")
The data for this exercise were simulated without a correlation, as follows:
tree<-pbtree(n=60,scale=1)
Q<-matrix(c(-1,1,1,-1),2,2)
rownames(Q)<-colnames(Q)<-letters[1:2]
x<-sim.history(tree,Q)$states
rownames(Q)<-colnames(Q)<-letters[3:4]
y<-sim.history(tree,Q)$states
Hi Liam, thanks for this update.
ReplyDeleteI followed your instruction and installed plotrix 3.6-1. However, no dots of traits were shown in the plot (the tree was plotted correctly though) when I used dotTree. I did get a few warning messages:
In cos(angles) * radius[circle] + x :
longer object length is not a multiple of shorter object length
I would appreciate a fix to this issue.
Thanks,
Cheng-Chiang Wu
Hope you can do a version of:
ReplyDeletePagel, M., and A. Meade. 2006. Bayesian analysis of correlated evolution of discrete characters by reversible‐jump Markov chain Monte Carlo. The American Naturalist 167: 808–825.
I wonder if stochastic mapping could be adapted to this?
I also got the error that CW got.
ReplyDeleteDoes it handle missing data?
ReplyDeleteHello Liam,
ReplyDeleteI am interested in looking at the association between two categorical traits, one being binary but the other one having more than two states. Does the fitPagel command deal with more than two states characters automatically? In Pagel's "User's Manual for Discrete", we can read that "any trait with more than two states can be represented as a series of binary traits, each one contrasting a group labelled "1" with all of the other traits [...]". Is this implemented in the command already?
Thanks,
Bruno