Tuesday, May 24, 2016

Fix to a peculiar quirk of fitPagel

I just pushed a very small update to how the argument method is handled in the phytools function for Pagel's (1994) method, fitPagel.

The argument method can be used to specify which function is used internally to maximize the likelihood of the different models. The options at present are "fitMk", the default, "ace", and "fitDiscrete", the latter two using the ape function ace and the geiger function fitDiscrete, respectively. Of these, fitDiscrete is probably the most robust, although it is also the slowest.

One quirk that existed in the function is that if a value of method was supplied that was not one of the aforementioned, then fitMk would be used; however the print method would indicate that whatever (possibly nonsensical) method the user supplied had been used to optimize the model! This could create the impression that methods exist in the function that in fact do not.

So for instance:

library(phytools)
tree
## 
## Phylogenetic tree with 50 tips and 49 internal nodes.
## 
## Tip labels:
##  t11, t12, t6, t7, t24, t25, ...
## 
## Rooted; includes branch lengths.
x
## t11 t12  t6  t7 t24 t25  t2 t30 t47 t48 t31 t32 t33  t3 t45 t46 t35 t40 
##   a   a   a   a   b   a   b   b   b   b   a   b   b   a   b   b   b   b 
## t41 t15  t4 t16 t17 t28 t29 t19  t9 t10 t36 t37 t34  t1  t5  t8 t13 t26 
##   b   a   a   b   b   b   a   a   a   a   a   a   a   a   a   a   a   a 
## t27 t22 t23 t49 t50 t18 t38 t39 t20 t21 t14 t42 t43 t44 
##   a   a   a   a   a   a   a   a   a   a   a   b   b   b 
## Levels: a b
y
## t11 t12  t6  t7 t24 t25  t2 t30 t47 t48 t31 t32 t33  t3 t45 t46 t35 t40 
##   c   d   c   d   c   c   c   c   c   c   c   d   d   c   d   d   d   d 
## t41 t15  t4 t16 t17 t28 t29 t19  t9 t10 t36 t37 t34  t1  t5  t8 t13 t26 
##   d   c   d   c   d   c   c   c   c   d   c   c   c   d   d   c   d   d 
## t27 t22 t23 t49 t50 t18 t38 t39 t20 t21 t14 t42 t43 t44 
##   d   c   c   c   c   d   c   c   d   c   c   d   d   d 
## Levels: c d
fit.fitMk<-fitPagel(tree,x,y)
fit.fitMk
## 
##   Pagel's binary character correlation test:
## 
## Independent model rate matrix:
##           a|c       a|d        b|c        b|d
## a|c -3.719703  2.746968  0.9727354  0.0000000
## a|d  3.260783 -4.233518  0.0000000  0.9727354
## b|c  2.538373  0.000000 -5.2853407  2.7469677
## b|d  0.000000  2.538373  3.2607829 -5.7991559
## 
## Dependent model rate matrix:
##           a|c       a|d        b|c        b|d
## a|c -4.935932  2.930609   2.005324  0.0000000
## a|d  3.570132 -3.913473   0.000000  0.3433411
## b|c  7.455589  0.000000 -10.655506  3.1999174
## b|d  0.000000  0.000000   3.530802 -3.5308018
## 
## Model fit:
##             log-likelihood
## independent      -48.66884
## dependent        -46.63219
## 
## Hypothesis test result:
##   likelihood-ratio:  4.073295 
##   p-value:  0.3961773 
## 
## Model fitting method used was fitMk
fit.fitDiscrete<-fitPagel(tree,x,y,method="fitDiscrete")
## Warning in fitDiscrete(tree, xy, model = iQ, ...): Parameter estimates appear at bounds:
##  q14
##  q23
##  q32
##  q41
## Warning in fitDiscrete(tree, xy, model = dQ, ...): Parameter estimates appear at bounds:
##  q14
##  q23
##  q32
##  q41
fit.fitDiscrete
## 
##   Pagel's binary character correlation test:
## 
## Independent model rate matrix:
##           a|c       a|d        b|c        b|d
## a|c -3.700663  2.746975  0.9536884  0.0000000
## a|d  3.260749 -4.214438  0.0000000  0.9536884
## b|c  2.524223  0.000000 -5.2711974  2.7469748
## b|d  0.000000  2.524223  3.2607493 -5.7849718
## 
## Dependent model rate matrix:
##           a|c           a|d        b|c        b|d
## a|c -4.793643  2.999862e+00   1.793782  0.0000000
## a|d  3.478373 -3.889966e+00   0.000000  0.4115926
## b|c  7.314845  0.000000e+00 -10.270839  2.9559936
## b|d  0.000000  5.238634e-17   3.518613 -3.5186134
## 
## Model fit:
##             log-likelihood
## independent      -48.66200
## dependent        -46.61651
## 
## Hypothesis test result:
##   likelihood-ratio:  4.090979 
##   p-value:  0.3938332 
## 
## Model fitting method used was fitDiscrete
fit.lambda<-fitPagel(tree,x,y,method="lambda")
fit.lambda ## actually no such method exists
## 
##   Pagel's binary character correlation test:
## 
## Independent model rate matrix:
##           a|c       a|d        b|c        b|d
## a|c -3.719703  2.746968  0.9727354  0.0000000
## a|d  3.260783 -4.233518  0.0000000  0.9727354
## b|c  2.538373  0.000000 -5.2853407  2.7469677
## b|d  0.000000  2.538373  3.2607829 -5.7991559
## 
## Dependent model rate matrix:
##           a|c       a|d        b|c        b|d
## a|c -4.935932  2.930609   2.005324  0.0000000
## a|d  3.570132 -3.913473   0.000000  0.3433411
## b|c  7.455589  0.000000 -10.655506  3.1999174
## b|d  0.000000  0.000000   3.530802 -3.5308018
## 
## Model fit:
##             log-likelihood
## independent      -48.66884
## dependent        -46.63219
## 
## Hypothesis test result:
##   likelihood-ratio:  4.073295 
##   p-value:  0.3961773 
## 
## Model fitting method used was lambda

In the final case method="fitMk" was used, but the print method does not show this - instead giving the nonsensical method "lambda".

The fixed version can be installed from GitHub in the typical way. Here I will load from source to show the difference:

source("https://raw.githubusercontent.com/liamrevell/phytools/b9e515397cb68b9c16549dca5db6c26ec5eecb13/R/fitPagel.R")
fit.lambda<-fitPagel(tree,x,y,method="lambda")
##   method = "lambda" not found.
##   Defaulting to method = "fitMk"
fit.lambda
## 
##   Pagel's binary character correlation test:
## 
## Independent model rate matrix:
##           a|c       a|d        b|c        b|d
## a|c -3.719703  2.746968  0.9727354  0.0000000
## a|d  3.260783 -4.233518  0.0000000  0.9727354
## b|c  2.538373  0.000000 -5.2853407  2.7469677
## b|d  0.000000  2.538373  3.2607829 -5.7991559
## 
## Dependent model rate matrix:
##           a|c       a|d        b|c        b|d
## a|c -4.935932  2.930609   2.005324  0.0000000
## a|d  3.570132 -3.913473   0.000000  0.3433411
## b|c  7.455589  0.000000 -10.655506  3.1999174
## b|d  0.000000  0.000000   3.530802 -3.5308018
## 
## Model fit:
##             log-likelihood
## independent      -48.66884
## dependent        -46.63219
## 
## Hypothesis test result:
##   likelihood-ratio:  4.073295 
##   p-value:  0.3961773 
## 
## Model fitting method used was fitMk

Data for this example were simulated without a correlation, as follows:

tree<-pbtree(n=50,scale=1)
Q<-matrix(c(-1,1,1,-1),2,2)
rownames(Q)<-colnames(Q)<-letters[1:2]
x<-as.factor(sim.history(tree,Q)$states)
rownames(Q)<-colnames(Q)<-letters[3:4]
y<-as.factor(sim.history(tree,Q)$states)

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.