Friday, September 18, 2015

Additional method for fitPagel to fit Pagel's (1994) model of correlated binary character evolution

I just updated `fitPagel`, a phytools function that fit the Pagel (1994) method for testing whether theh evolutiokn of one binary character affects a second character (or vice versa).

The way this method works is it simply re-codes the two binary characters into a single four state character - e.g., `0|0`, `0|1`, etc. Then it fits two different models. In the first model, the 'independent' model, the rate of transition in the first character is constrained to be equal for a given transition type, irregardless of the state for the second trait. So, for instance, the rate of transition `0|0 -> 1|0` is the same as `0|1 -> 1|1` because both transitions involve the same change (`0 -> 1`) in character 1. The second model, the 'dependent' model, allows each type of transition to have a different rate. In other words `0|0 -> 1|0` and `0|1 -> 1|1` can have different rates (different probabilities of occuring on a given time interval), even though the exact same type of change is occuring for the same character in both instances. Obviously, transitions `0|0 -> 1|1` are constrained to have a rate of zero in both models.

The update now permits the phytools function `fitMk` to be used internally to fit the Mk model. Previously the two methods available were `method = "ace"` and `method = "fitDiscrete"`, and these are both still available as options.

Let's compare them:

``````library(devtools)
install_github("liamrevell/phytools",quiet=TRUE)
library(phytools)
``````
``````## Loading required package: ape
``````

We can start by simulating data under the 'dependent' model

``````## simulate tree
tree<-pbtree(n=300,scale=1)
## simulate data for the two characters
Q<-matrix(c(0,0.4,0.4,0,2,0,0,2,2,0,0,2,0,0.4,0.4,0),4,4,byrow=TRUE)
rownames(Q)<-colnames(Q)<-c("aa","ab","ba","bb")
diag(Q)<--rowSums(Q)
tt<-sim.history(tree,Q)
``````
``````## Note - the rate of substitution from i->j should be given by Q[j,i].
## Detecting that rows, not columns, of Q sum to zero :
##   Transposing Q for internal calculations.
## Done simulation(s).
``````
``````## split the 'aa','ab','bb','ba' data into two characters
t1<-mergeMappedStates(tt,c("aa","ab"),"a")
t1<-mergeMappedStates(t1,c("ba","bb"),"b")
t2<-mergeMappedStates(tt,c("aa","ba"),"a")
t2<-mergeMappedStates(t2,c("ab","bb"),"b")
t1\$states<-getStates(t1,"tips")
t2\$states<-getStates(t2,"tips")
## visualize the correlated evolution of the two traits
par(mfrow=c(1,2))
plotSimmap(t1,setNames(c("red","blue"),letters[1:2]),lwd=1,ftype="off")
plotSimmap(t2,setNames(c("red","blue"),letters[1:2]),lwd=1,ftype="off",
direction="leftwards")
``````

``````## extract the tip data from our simulation
x<-getStates(t1,"tips")
y<-getStates(t2,"tips")
``````

Now we're ready to fit our model using all three methods:

``````fitPagel(tree,x,y) ## default method is "fitMk"
``````
``````##
##   Pagel's binary character correlation test:
##
## Indepedent model rate matrix:
##            a|a        a|b        b|a        b|b
## a|a -0.8726028  0.6514011  0.2212017  0.0000000
## a|b  0.3220958 -0.5432975  0.0000000  0.2212017
## b|a  0.6189176  0.0000000 -1.2703187  0.6514011
## b|b  0.0000000  0.6189176  0.3220958 -0.9410134
##
## Dependent model rate matrix:
##            a|a        a|b        b|a        b|b
## a|a -0.5951317  0.3393103  0.2558215  0.0000000
## a|b  4.6486390 -4.7698708  0.0000000  0.1212318
## b|a  2.2086339  0.0000000 -4.5275279  2.3188940
## b|b  0.0000000  0.4113254  0.3105573 -0.7218827
##
## Model fit:
##             log-likelihood
## independent      -172.8745
## dependent        -150.4001
##
## Hypothesis test result:
##   likelihood-ratio:  44.94869
##   p-value:  4.074835e-09
##
## Model fitting method used was fitMk
``````
``````fitPagel(tree,x,y,method="ace")
``````
``````##
##   Pagel's binary character correlation test:
##
## Indepedent model rate matrix:
##            a|a        a|b        b|a        b|b
## a|a -0.8725883  0.6513844  0.2212039  0.0000000
## a|b  0.3220944 -0.5432984  0.0000000  0.2212039
## b|a  0.6189141  0.0000000 -1.2702985  0.6513844
## b|b  0.0000000  0.6189141  0.3220944 -0.9410086
##
## Dependent model rate matrix:
##           a|a        a|b        b|a        b|b
## a|a -0.595134  0.3393112  0.2558228  0.0000000
## a|b  4.648654 -4.7698873  0.0000000  0.1212328
## b|a  2.208634  0.0000000 -4.5275253  2.3188912
## b|b  0.000000  0.4113245  0.3105577 -0.7218823
##
## Model fit:
##             log-likelihood
## independent      -171.4882
## dependent        -149.0138
##
## Hypothesis test result:
##   likelihood-ratio:  44.94869
##   p-value:  4.074835e-09
##
## Model fitting method used was ace
``````
``````fitPagel(tree,x,y,method="fitDiscrete")
``````
``````## Loading required package: geiger
``````
``````## Warning in fitDiscrete(tree, xy, model = iQ): Parameter estimates appear at bounds:
##  q14
##  q23
##  q32
##  q41
``````
``````## Warning in fitDiscrete(tree, xy, model = dQ): Parameter estimates appear at bounds:
##  q14
##  q23
##  q32
##  q41
``````
``````##
##   Pagel's binary character correlation test:
##
## Indepedent model rate matrix:
##            a|a        a|b        b|a        b|b
## a|a -0.8815200  0.6639046  0.2176153  0.0000000
## a|b  0.3054217 -0.5230370  0.0000000  0.2176153
## b|a  0.6199298  0.0000000 -1.2838344  0.6639046
## b|b  0.0000000  0.6199298  0.3054217 -0.9253515
##
## Dependent model rate matrix:
##            a|a        a|b        b|a         b|b
## a|a -0.5943124  0.3350400  0.2592725  0.00000000
## a|b  4.0225133 -4.0854456  0.0000000  0.06293222
## b|a  2.0872453  0.0000000 -4.5241365  2.43689116
## b|b  0.0000000  0.3640895  0.2807990 -0.64488849
##
## Model fit:
##             log-likelihood
## independent      -171.5709
## dependent        -149.5117
##
## Hypothesis test result:
##   likelihood-ratio:  44.11832
##   p-value:  6.06277e-09
##
## Model fitting method used was fitDiscrete
``````

That's it.

1. Dear Liam,

I try to use fitPagel to check for correlation between two binary traits. However, I always get the following message when I'm using my data with fitMk method:

"model does not have the right number of columns"

Could you explain to me what this message exactly means?

Best
Marcin

1. Hi Marcin.

This is a known bug in the CRAN version. I think it should be fixed if you update phytools from GitHub. To do that start a fresh R session & run:

library(devtools) ## must have devtools installed
install_github("liamrevell/phytools")

Let us know if that takes care of the issue.

- Liam

2. Thanks a lot. Now everything works perfectly.

Best
Marcin

2. Dear Liam,

Is it possible to select a substitution model for one variable and other model for the other? (say ER for X and ARD for Y)

Cheers,

Guilherme

3. Dear Liam,

I would like to use "fitPagel" to check for correlation between two binary traits. However, I always get the following message "could not find function "fitPagel"" even though I have the latest R version (3.4.0) and the libraries "phytools", "ape", "geiger", "rgl" installed and loaded.Do you have an idea why I get the error message?

Best

Stefan

1. I'm not sure why this would be the case. Have you tried restarting your R session?

2. Dear Liam,

thanks for your fast answer. Surprisingly, I was able to conduct PagelTests if I download phytools from the Danish server but not from German (GÃ¶ttingen, MÃ¼nster) servers.

Do you know a possibility to conduct a test with two factorial character of which one is not binary?

Best

Stefan

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.