Wednesday, May 25, 2016

Update to fitPagel to permit different dependency relationships & different models of evolution

I just updated the function fitPagel with several new features and models.

fitPagel fits the Pagel (1994) model in which the rate of evolution for one discrete character can be affected by a second, and vice versa.

The extensions in this version are as follows.

(1) I added the argument dep.var which can have values "xy", "x", or "y" and permits (respectively) the substitution rate of x to depend on the state of y and vice versa, the rate of x (only) to depend on the state of y, and, finally, the rate of y (only) to dependent on the state of x.

(2) I also added the argument model which can have values "ER", "SYM", or "ARD". This just sets the substitution model for the individual characters, regardless of whether or not the rates depend on a second state. So, for instance, if dep.var="x" and model="ER", then x will have two rates, depending on the state of y, but these two rates will be equal in a forward & backward direction.

Note that model="ER" may not correspond with what we often think of as a correlated evolution model, because under a correlated evolution model we might think that if trait 1 acquires state "b", for instance, than trait 2 should also acquire state "b" - but under an equal rates model "b|a" -> "b|b" may occur at a higher rate than "a|a" -> "a|b", but that high rate is forced to be equal to the rate of leaving the correlated states, "b|b" -> "b|a".

Here is a demo of how using these new options might look:

library(phytools)
packageVersion("phytools") ## from GitHub
## [1] '0.5.32'
## first, here is our data & tree
colors<-setNames(palette()[1:4],sort(unique(c(x,y))))
dotTree(tree,cbind(x,y),data.type="discrete",fsize=0.7,
    colors=colors,pts=FALSE)

plot of chunk unnamed-chunk-1

Now let's fit some models:

fit.xy<-fitPagel(tree,x,y)
fit.x<-fitPagel(tree,x,y,dep.var="x")
fit.y<-fitPagel(tree,x,y,dep.var="y")
fit.xy
## 
## Pagel's binary character correlation test:
## 
## Assumes "ARD" substitution model for both characters
## 
## Independent model rate matrix:
##            a|c        a|d       b|c       b|d
## a|c -3.7275431  2.6817189  1.045824  0.000000
## a|d  1.2367301 -2.2825542  0.000000  1.045824
## b|c  0.5811491  0.0000000 -3.262868  2.681719
## b|d  0.0000000  0.5811491  1.236730 -1.817879
## 
## Dependent (x & y) model rate matrix:
##            a|c        a|d       b|c       b|d
## a|c -5.5939241  5.5939241  0.000000  0.000000
## a|d  0.4590907 -1.8971976  0.000000  1.438107
## b|c  0.6371031  0.0000000 -2.886065  2.248962
## b|d  0.0000000  0.4892436  1.823519 -2.312762
## 
## Model fit:
##             log-likelihood      AIC
## independent      -53.76123 115.5225
## dependent        -52.44912 120.8982
## 
## Hypothesis test result:
##   likelihood-ratio:  2.624216 
##   p-value:  0.6225394 
## 
## Model fitting method used was fitMk
fit.x
## 
## Pagel's binary character correlation test:
## 
## Assumes "ARD" substitution model for both characters
## 
## Independent model rate matrix:
##            a|c        a|d       b|c       b|d
## a|c -3.7275431  2.6817189  1.045824  0.000000
## a|d  1.2367301 -2.2825542  0.000000  1.045824
## b|c  0.5811491  0.0000000 -3.262868  2.681719
## b|d  0.0000000  0.5811491  1.236730 -1.817879
## 
## Dependent (x only) model rate matrix:
##            a|c        a|d       b|c        b|d
## a|c -5.8351846  2.6828060  3.152379  0.0000000
## a|d  1.2044747 -1.9168270  0.000000  0.7123523
## b|c  0.0975676  0.0000000 -2.780374  2.6828060
## b|d  0.0000000  0.8518788  1.204475 -2.0563535
## 
## Model fit:
##             log-likelihood      AIC
## independent      -53.76123 115.5225
## dependent        -52.86809 117.7362
## 
## Hypothesis test result:
##   likelihood-ratio:  1.786292 
##   p-value:  0.4093659 
## 
## Model fitting method used was fitMk
fit.y
## 
## Pagel's binary character correlation test:
## 
## Assumes "ARD" substitution model for both characters
## 
## Independent model rate matrix:
##            a|c        a|d       b|c       b|d
## a|c -3.7275431  2.6817189  1.045824  0.000000
## a|d  1.2367301 -2.2825542  0.000000  1.045824
## b|c  0.5811491  0.0000000 -3.262868  2.681719
## b|d  0.0000000  0.5811491  1.236730 -1.817879
## 
## Dependent (y only) model rate matrix:
##            a|c        a|d       b|c       b|d
## a|c -5.7733459  4.5972311  1.176115  0.000000
## a|d  0.5382396 -1.7143544  0.000000  1.176115
## b|c  0.5655022  0.0000000 -3.120088  2.554586
## b|d  0.0000000  0.5655022  1.780828 -2.346330
## 
## Model fit:
##             log-likelihood      AIC
## independent      -53.76123 115.5225
## dependent        -52.54412 117.0882
## 
## Hypothesis test result:
##   likelihood-ratio:  2.434218 
##   p-value:  0.2960849 
## 
## Model fitting method used was fitMk

There was no correlation simulated in these data - consequently it should be unsurprising (and perhaps reassuring!) that the best fit model is the simplest, independent model.

Since fitPagel is just a wrapper function for other model-fitting functions, we can change the optimizer easily. One good option is fitDiscrete in the geiger package:

fit.geiger<-fitPagel(tree,x,y,method="fitDiscrete")
## Warning in fitDiscrete(tree, xy, model = iQ, ...): Parameter estimates appear at bounds:
##  q14
##  q23
##  q32
##  q41
## Warning in fitDiscrete(tree, xy, model = dQ, ...): Parameter estimates appear at bounds:
##  q14
##  q23
##  q32
##  q41
fit.geiger
## 
## Pagel's binary character correlation test:
## 
## Assumes "ARD" substitution model for both characters
## 
## Independent model rate matrix:
##            a|c        a|d        b|c        b|d
## a|c -3.2101072  2.6611828  0.5489243  0.0000000
## a|d  1.2335297 -1.7824540  0.0000000  0.5489243
## b|c  0.5872987  0.0000000 -3.2484816  2.6611828
## b|d  0.0000000  0.5872987  1.2335297 -1.8208284
## 
## Dependent (x & y) model rate matrix:
##            a|c        a|d           b|c        b|d
## a|c -3.7711178  3.7711178  1.191744e-16  0.0000000
## a|d  0.5702951 -1.2146389  0.000000e+00  0.6443438
## b|c  0.2837940  0.0000000 -3.256106e+00  2.9723119
## b|d  0.0000000  0.7770299  1.739702e+00 -2.5167321
## 
## Model fit:
##             log-likelihood      AIC
## independent      -53.27732 114.5546
## dependent        -51.99965 119.9993
## 
## Hypothesis test result:
##   likelihood-ratio:  2.555352 
##   p-value:  0.6347523 
## 
## Model fitting method used was fitDiscrete

Note that although the parameter estimates & likelihoods should be quite similar, they won't be precisely the same because fitDiscrete makes different default root node prior. The likelihoods should also not be compared.

Finally, we can try the "ER" model as follows, noting the caveat that I outline above:

fit.ER<-fitPagel(tree,x,y,model="ER")
fit.ER
## 
## Pagel's binary character correlation test:
## 
## Assumes "ER" substitution model for both characters
## 
## Independent model rate matrix:
##            a|c        a|d        b|c        b|d
## a|c -1.5175113  0.9151581  0.6023532  0.0000000
## a|d  0.9151581 -1.5175113  0.0000000  0.6023532
## b|c  0.6023532  0.0000000 -1.5175113  0.9151581
## b|d  0.0000000  0.6023532  0.9151581 -1.5175113
## 
## Dependent (x & y) model rate matrix:
##            a|c        a|d        b|c        b|d
## a|c -0.4990655  0.3757513  0.1233143  0.0000000
## a|d  0.3757513 -1.2105813  0.0000000  0.8348301
## b|c  0.1233143  0.0000000 -1.4990703  1.3757560
## b|d  0.0000000  0.8348301  1.3757560 -2.2105861
## 
## Model fit:
##             log-likelihood      AIC
## independent      -54.94916 113.8983
## dependent        -53.66275 115.3255
## 
## Hypothesis test result:
##   likelihood-ratio:  2.572823 
##   p-value:  0.2762603 
## 
## Model fitting method used was fitMk

Unfortunately, to run dotTree as I have above you may have to roll-back the internally used package, plotrix. For some reason I've yet to figure out, the latest plotrix version breaks dotTree (and in quite a spectacular way). Hopefully I will get to the bottom of it soon.

## here is how to install an archived version of plotrix
install.packages(
"https://cran.r-project.org/src/contrib/Archive/plotrix/plotrix_3.6-1.tar.gz",
    type="source")

The data for this exercise were simulated without a correlation, as follows:

tree<-pbtree(n=60,scale=1)
Q<-matrix(c(-1,1,1,-1),2,2)
rownames(Q)<-colnames(Q)<-letters[1:2]
x<-sim.history(tree,Q)$states
rownames(Q)<-colnames(Q)<-letters[3:4]
y<-sim.history(tree,Q)$states

5 comments:

  1. Hi Liam, thanks for this update.
    I followed your instruction and installed plotrix 3.6-1. However, no dots of traits were shown in the plot (the tree was plotted correctly though) when I used dotTree. I did get a few warning messages:
    In cos(angles) * radius[circle] + x :
    longer object length is not a multiple of shorter object length

    I would appreciate a fix to this issue.

    Thanks,
    Cheng-Chiang Wu

    ReplyDelete
  2. Hope you can do a version of:

    Pagel, M., and A. Meade. 2006. Bayesian analysis of correlated evolution of discrete characters by reversible‐jump Markov chain Monte Carlo. The American Naturalist 167: 808–825.

    I wonder if stochastic mapping could be adapted to this?

    ReplyDelete
  3. Hello Liam,
    I am interested in looking at the association between two categorical traits, one being binary but the other one having more than two states. Does the fitPagel command deal with more than two states characters automatically? In Pagel's "User's Manual for Discrete", we can read that "any trait with more than two states can be represented as a series of binary traits, each one contrasting a group labelled "1" with all of the other traits [...]". Is this implemented in the command already?
    Thanks,
    Bruno

    ReplyDelete

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.