Friday, September 18, 2015

Additional method for fitPagel to fit Pagel's (1994) model of correlated binary character evolution

I just updated `fitPagel`, a phytools function that fit the Pagel (1994) method for testing whether theh evolutiokn of one binary character affects a second character (or vice versa).

The way this method works is it simply re-codes the two binary characters into a single four state character - e.g., `0|0`, `0|1`, etc. Then it fits two different models. In the first model, the 'independent' model, the rate of transition in the first character is constrained to be equal for a given transition type, irregardless of the state for the second trait. So, for instance, the rate of transition `0|0 -> 1|0` is the same as `0|1 -> 1|1` because both transitions involve the same change (`0 -> 1`) in character 1. The second model, the 'dependent' model, allows each type of transition to have a different rate. In other words `0|0 -> 1|0` and `0|1 -> 1|1` can have different rates (different probabilities of occuring on a given time interval), even though the exact same type of change is occuring for the same character in both instances. Obviously, transitions `0|0 -> 1|1` are constrained to have a rate of zero in both models.

The update now permits the phytools function `fitMk` to be used internally to fit the Mk model. Previously the two methods available were `method = "ace"` and `method = "fitDiscrete"`, and these are both still available as options.

Let's compare them:

``````library(devtools)
install_github("liamrevell/phytools",quiet=TRUE)
library(phytools)
``````
``````## Loading required package: ape
``````

We can start by simulating data under the 'dependent' model

``````## simulate tree
tree<-pbtree(n=300,scale=1)
## simulate data for the two characters
Q<-matrix(c(0,0.4,0.4,0,2,0,0,2,2,0,0,2,0,0.4,0.4,0),4,4,byrow=TRUE)
rownames(Q)<-colnames(Q)<-c("aa","ab","ba","bb")
diag(Q)<--rowSums(Q)
tt<-sim.history(tree,Q)
``````
``````## Note - the rate of substitution from i->j should be given by Q[j,i].
## Detecting that rows, not columns, of Q sum to zero :
##   Transposing Q for internal calculations.
## Done simulation(s).
``````
``````## split the 'aa','ab','bb','ba' data into two characters
t1<-mergeMappedStates(tt,c("aa","ab"),"a")
t1<-mergeMappedStates(t1,c("ba","bb"),"b")
t2<-mergeMappedStates(tt,c("aa","ba"),"a")
t2<-mergeMappedStates(t2,c("ab","bb"),"b")
t1\$states<-getStates(t1,"tips")
t2\$states<-getStates(t2,"tips")
## visualize the correlated evolution of the two traits
par(mfrow=c(1,2))
plotSimmap(t1,setNames(c("red","blue"),letters[1:2]),lwd=1,ftype="off")
plotSimmap(t2,setNames(c("red","blue"),letters[1:2]),lwd=1,ftype="off",
direction="leftwards")
``````

``````## extract the tip data from our simulation
x<-getStates(t1,"tips")
y<-getStates(t2,"tips")
``````

Now we're ready to fit our model using all three methods:

``````fitPagel(tree,x,y) ## default method is "fitMk"
``````
``````##
##   Pagel's binary character correlation test:
##
## Indepedent model rate matrix:
##            a|a        a|b        b|a        b|b
## a|a -0.8726028  0.6514011  0.2212017  0.0000000
## a|b  0.3220958 -0.5432975  0.0000000  0.2212017
## b|a  0.6189176  0.0000000 -1.2703187  0.6514011
## b|b  0.0000000  0.6189176  0.3220958 -0.9410134
##
## Dependent model rate matrix:
##            a|a        a|b        b|a        b|b
## a|a -0.5951317  0.3393103  0.2558215  0.0000000
## a|b  4.6486390 -4.7698708  0.0000000  0.1212318
## b|a  2.2086339  0.0000000 -4.5275279  2.3188940
## b|b  0.0000000  0.4113254  0.3105573 -0.7218827
##
## Model fit:
##             log-likelihood
## independent      -172.8745
## dependent        -150.4001
##
## Hypothesis test result:
##   likelihood-ratio:  44.94869
##   p-value:  4.074835e-09
##
## Model fitting method used was fitMk
``````
``````fitPagel(tree,x,y,method="ace")
``````
``````##
##   Pagel's binary character correlation test:
##
## Indepedent model rate matrix:
##            a|a        a|b        b|a        b|b
## a|a -0.8725883  0.6513844  0.2212039  0.0000000
## a|b  0.3220944 -0.5432984  0.0000000  0.2212039
## b|a  0.6189141  0.0000000 -1.2702985  0.6513844
## b|b  0.0000000  0.6189141  0.3220944 -0.9410086
##
## Dependent model rate matrix:
##           a|a        a|b        b|a        b|b
## a|a -0.595134  0.3393112  0.2558228  0.0000000
## a|b  4.648654 -4.7698873  0.0000000  0.1212328
## b|a  2.208634  0.0000000 -4.5275253  2.3188912
## b|b  0.000000  0.4113245  0.3105577 -0.7218823
##
## Model fit:
##             log-likelihood
## independent      -171.4882
## dependent        -149.0138
##
## Hypothesis test result:
##   likelihood-ratio:  44.94869
##   p-value:  4.074835e-09
##
## Model fitting method used was ace
``````
``````fitPagel(tree,x,y,method="fitDiscrete")
``````
``````## Loading required package: geiger
``````
``````## Warning in fitDiscrete(tree, xy, model = iQ): Parameter estimates appear at bounds:
##  q14
##  q23
##  q32
##  q41
``````
``````## Warning in fitDiscrete(tree, xy, model = dQ): Parameter estimates appear at bounds:
##  q14
##  q23
##  q32
##  q41
``````
``````##
##   Pagel's binary character correlation test:
##
## Indepedent model rate matrix:
##            a|a        a|b        b|a        b|b
## a|a -0.8815200  0.6639046  0.2176153  0.0000000
## a|b  0.3054217 -0.5230370  0.0000000  0.2176153
## b|a  0.6199298  0.0000000 -1.2838344  0.6639046
## b|b  0.0000000  0.6199298  0.3054217 -0.9253515
##
## Dependent model rate matrix:
##            a|a        a|b        b|a         b|b
## a|a -0.5943124  0.3350400  0.2592725  0.00000000
## a|b  4.0225133 -4.0854456  0.0000000  0.06293222
## b|a  2.0872453  0.0000000 -4.5241365  2.43689116
## b|b  0.0000000  0.3640895  0.2807990 -0.64488849
##
## Model fit:
##             log-likelihood
## independent      -171.5709
## dependent        -149.5117
##
## Hypothesis test result:
##   likelihood-ratio:  44.11832
##   p-value:  6.06277e-09
##
## Model fitting method used was fitDiscrete
``````

That's it.

1. Dear Liam,

I try to use fitPagel to check for correlation between two binary traits. However, I always get the following message when I'm using my data with fitMk method:

"model does not have the right number of columns"

Could you explain to me what this message exactly means?

Best
Marcin

1. Hi Marcin.

This is a known bug in the CRAN version. I think it should be fixed if you update phytools from GitHub. To do that start a fresh R session & run:

library(devtools) ## must have devtools installed
install_github("liamrevell/phytools")

Let us know if that takes care of the issue.

- Liam

2. Thanks a lot. Now everything works perfectly.

Best
Marcin

2. Dear Liam,

Is it possible to select a substitution model for one variable and other model for the other? (say ER for X and ARD for Y)

Cheers,

Guilherme

3. Dear Liam,

I would like to use "fitPagel" to check for correlation between two binary traits. However, I always get the following message "could not find function "fitPagel"" even though I have the latest R version (3.4.0) and the libraries "phytools", "ape", "geiger", "rgl" installed and loaded.Do you have an idea why I get the error message?

Best

Stefan

1. I'm not sure why this would be the case. Have you tried restarting your R session?

2. Dear Liam,

thanks for your fast answer. Surprisingly, I was able to conduct PagelTests if I download phytools from the Danish server but not from German (GÃ¶ttingen, MÃ¼nster) servers.

Do you know a possibility to conduct a test with two factorial character of which one is not binary?

Best

Stefan

4. This is one of the best posts i have seen in buy replica watches a long time. thanks for posting it. while there are different grades of replicas,( some better than others) it is best to read the reviews from the owners on these boards to find out replica Omega watches UK the real quality of a watch you are wanting to purchase. never listen to a seller just arbitrarily saying his product is grade 1 or AAA. he only wants your money.