Friday, September 18, 2015

Additional method for fitPagel to fit Pagel's (1994) model of correlated binary character evolution

I just updated fitPagel, a phytools function that fit the Pagel (1994) method for testing whether theh evolutiokn of one binary character affects a second character (or vice versa).

The way this method works is it simply re-codes the two binary characters into a single four state character - e.g., 0|0, 0|1, etc. Then it fits two different models. In the first model, the 'independent' model, the rate of transition in the first character is constrained to be equal for a given transition type, irregardless of the state for the second trait. So, for instance, the rate of transition 0|0 -> 1|0 is the same as 0|1 -> 1|1 because both transitions involve the same change (0 -> 1) in character 1. The second model, the 'dependent' model, allows each type of transition to have a different rate. In other words 0|0 -> 1|0 and 0|1 -> 1|1 can have different rates (different probabilities of occuring on a given time interval), even though the exact same type of change is occuring for the same character in both instances. Obviously, transitions 0|0 -> 1|1 are constrained to have a rate of zero in both models.

The update now permits the phytools function fitMk to be used internally to fit the Mk model. Previously the two methods available were method = "ace" and method = "fitDiscrete", and these are both still available as options.

Let's compare them:

library(devtools)
install_github("liamrevell/phytools",quiet=TRUE)
library(phytools)
## Loading required package: ape
## Loading required package: maps

We can start by simulating data under the 'dependent' model

## simulate tree
tree<-pbtree(n=300,scale=1)
## simulate data for the two characters
Q<-matrix(c(0,0.4,0.4,0,2,0,0,2,2,0,0,2,0,0.4,0.4,0),4,4,byrow=TRUE)
rownames(Q)<-colnames(Q)<-c("aa","ab","ba","bb")
diag(Q)<--rowSums(Q)
tt<-sim.history(tree,Q)
## Note - the rate of substitution from i->j should be given by Q[j,i].
## Detecting that rows, not columns, of Q sum to zero :
##   Transposing Q for internal calculations.
## Done simulation(s).
## split the 'aa','ab','bb','ba' data into two characters
t1<-mergeMappedStates(tt,c("aa","ab"),"a")
t1<-mergeMappedStates(t1,c("ba","bb"),"b")
t2<-mergeMappedStates(tt,c("aa","ba"),"a")
t2<-mergeMappedStates(t2,c("ab","bb"),"b")
t1$states<-getStates(t1,"tips")
t2$states<-getStates(t2,"tips")
## visualize the correlated evolution of the two traits
par(mfrow=c(1,2))
plotSimmap(t1,setNames(c("red","blue"),letters[1:2]),lwd=1,ftype="off")
plotSimmap(t2,setNames(c("red","blue"),letters[1:2]),lwd=1,ftype="off",
    direction="leftwards")

plot of chunk unnamed-chunk-2

## extract the tip data from our simulation
x<-getStates(t1,"tips")
y<-getStates(t2,"tips")

Now we're ready to fit our model using all three methods:

fitPagel(tree,x,y) ## default method is "fitMk"
## 
##   Pagel's binary character correlation test:
## 
## Indepedent model rate matrix:
##            a|a        a|b        b|a        b|b
## a|a -0.8726028  0.6514011  0.2212017  0.0000000
## a|b  0.3220958 -0.5432975  0.0000000  0.2212017
## b|a  0.6189176  0.0000000 -1.2703187  0.6514011
## b|b  0.0000000  0.6189176  0.3220958 -0.9410134
## 
## Dependent model rate matrix:
##            a|a        a|b        b|a        b|b
## a|a -0.5951317  0.3393103  0.2558215  0.0000000
## a|b  4.6486390 -4.7698708  0.0000000  0.1212318
## b|a  2.2086339  0.0000000 -4.5275279  2.3188940
## b|b  0.0000000  0.4113254  0.3105573 -0.7218827
## 
## Model fit:
##             log-likelihood
## independent      -172.8745
## dependent        -150.4001
## 
## Hypothesis test result:
##   likelihood-ratio:  44.94869 
##   p-value:  4.074835e-09 
## 
## Model fitting method used was fitMk
fitPagel(tree,x,y,method="ace")
## 
##   Pagel's binary character correlation test:
## 
## Indepedent model rate matrix:
##            a|a        a|b        b|a        b|b
## a|a -0.8725883  0.6513844  0.2212039  0.0000000
## a|b  0.3220944 -0.5432984  0.0000000  0.2212039
## b|a  0.6189141  0.0000000 -1.2702985  0.6513844
## b|b  0.0000000  0.6189141  0.3220944 -0.9410086
## 
## Dependent model rate matrix:
##           a|a        a|b        b|a        b|b
## a|a -0.595134  0.3393112  0.2558228  0.0000000
## a|b  4.648654 -4.7698873  0.0000000  0.1212328
## b|a  2.208634  0.0000000 -4.5275253  2.3188912
## b|b  0.000000  0.4113245  0.3105577 -0.7218823
## 
## Model fit:
##             log-likelihood
## independent      -171.4882
## dependent        -149.0138
## 
## Hypothesis test result:
##   likelihood-ratio:  44.94869 
##   p-value:  4.074835e-09 
## 
## Model fitting method used was ace
fitPagel(tree,x,y,method="fitDiscrete")
## Loading required package: geiger
## Loading required package: parallel
## Warning in fitDiscrete(tree, xy, model = iQ): Parameter estimates appear at bounds:
##  q14
##  q23
##  q32
##  q41
## Warning in fitDiscrete(tree, xy, model = dQ): Parameter estimates appear at bounds:
##  q14
##  q23
##  q32
##  q41
## 
##   Pagel's binary character correlation test:
## 
## Indepedent model rate matrix:
##            a|a        a|b        b|a        b|b
## a|a -0.8815200  0.6639046  0.2176153  0.0000000
## a|b  0.3054217 -0.5230370  0.0000000  0.2176153
## b|a  0.6199298  0.0000000 -1.2838344  0.6639046
## b|b  0.0000000  0.6199298  0.3054217 -0.9253515
## 
## Dependent model rate matrix:
##            a|a        a|b        b|a         b|b
## a|a -0.5943124  0.3350400  0.2592725  0.00000000
## a|b  4.0225133 -4.0854456  0.0000000  0.06293222
## b|a  2.0872453  0.0000000 -4.5241365  2.43689116
## b|b  0.0000000  0.3640895  0.2807990 -0.64488849
## 
## Model fit:
##             log-likelihood
## independent      -171.5709
## dependent        -149.5117
## 
## Hypothesis test result:
##   likelihood-ratio:  44.11832 
##   p-value:  6.06277e-09 
## 
## Model fitting method used was fitDiscrete

That's it.

5 comments:

  1. Dear Liam,

    I try to use fitPagel to check for correlation between two binary traits. However, I always get the following message when I'm using my data with fitMk method:

    "model does not have the right number of columns"

    Could you explain to me what this message exactly means?

    Best
    Marcin

    ReplyDelete
    Replies
    1. Hi Marcin.

      This is a known bug in the CRAN version. I think it should be fixed if you update phytools from GitHub. To do that start a fresh R session & run:

      library(devtools) ## must have devtools installed
      install_github("liamrevell/phytools")

      Let us know if that takes care of the issue.

      - Liam

      Delete
    2. Thanks a lot. Now everything works perfectly.

      Best
      Marcin

      Delete
  2. Dear Liam,

    Is it possible to select a substitution model for one variable and other model for the other? (say ER for X and ARD for Y)

    Cheers,

    Guilherme

    ReplyDelete
  3. Dear Liam,

    I would like to use "fitPagel" to check for correlation between two binary traits. However, I always get the following message "could not find function "fitPagel"" even though I have the latest R version (3.4.0) and the libraries "phytools", "ape", "geiger", "rgl" installed and loaded.Do you have an idea why I get the error message?

    Best

    Stefan

    ReplyDelete