I just updated `fitPagel`

, a phytools function that fit the Pagel
(1994) method for testing whether theh evolutiokn of one binary character
affects a second character (or vice versa).

The way this method works is it simply re-codes the two binary characters
into a single four state character - e.g., `0|0`

, `0|1`

,
etc. Then it fits two different models. In the first model, the 'independent'
model, the rate of transition in the first character is constrained to be
equal for a given transition type, irregardless of the state for the second
trait. So, for instance, the rate of transition `0|0 -> 1|0`

is
the same as `0|1 -> 1|1`

because both transitions involve the
same change (`0 -> 1`

) in character 1. The second model, the
'dependent' model, allows each type of transition to have a different rate.
In other words `0|0 -> 1|0`

and `0|1 -> 1|1`

can have
different rates (different probabilities of occuring on a given time
interval), even though the exact same type of change is occuring for the
same character in both instances. Obviously, transitions
`0|0 -> 1|1`

are constrained to have a rate of zero in both
models.

The
update
now permits the phytools function `fitMk`

to be used internally
to fit the M*k* model. Previously the two methods available were
`method = "ace"`

and `method = "fitDiscrete"`

, and
these are both still available as options.

Let's compare them:

```
library(devtools)
install_github("liamrevell/phytools",quiet=TRUE)
library(phytools)
```

```
## Loading required package: ape
## Loading required package: maps
```

We can start by simulating data under the 'dependent' model

```
## simulate tree
tree<-pbtree(n=300,scale=1)
## simulate data for the two characters
Q<-matrix(c(0,0.4,0.4,0,2,0,0,2,2,0,0,2,0,0.4,0.4,0),4,4,byrow=TRUE)
rownames(Q)<-colnames(Q)<-c("aa","ab","ba","bb")
diag(Q)<--rowSums(Q)
tt<-sim.history(tree,Q)
```

```
## Note - the rate of substitution from i->j should be given by Q[j,i].
## Detecting that rows, not columns, of Q sum to zero :
## Transposing Q for internal calculations.
## Done simulation(s).
```

```
## split the 'aa','ab','bb','ba' data into two characters
t1<-mergeMappedStates(tt,c("aa","ab"),"a")
t1<-mergeMappedStates(t1,c("ba","bb"),"b")
t2<-mergeMappedStates(tt,c("aa","ba"),"a")
t2<-mergeMappedStates(t2,c("ab","bb"),"b")
t1$states<-getStates(t1,"tips")
t2$states<-getStates(t2,"tips")
## visualize the correlated evolution of the two traits
par(mfrow=c(1,2))
plotSimmap(t1,setNames(c("red","blue"),letters[1:2]),lwd=1,ftype="off")
plotSimmap(t2,setNames(c("red","blue"),letters[1:2]),lwd=1,ftype="off",
direction="leftwards")
```

```
## extract the tip data from our simulation
x<-getStates(t1,"tips")
y<-getStates(t2,"tips")
```

Now we're ready to fit our model using all three methods:

```
fitPagel(tree,x,y) ## default method is "fitMk"
```

```
##
## Pagel's binary character correlation test:
##
## Indepedent model rate matrix:
## a|a a|b b|a b|b
## a|a -0.8726028 0.6514011 0.2212017 0.0000000
## a|b 0.3220958 -0.5432975 0.0000000 0.2212017
## b|a 0.6189176 0.0000000 -1.2703187 0.6514011
## b|b 0.0000000 0.6189176 0.3220958 -0.9410134
##
## Dependent model rate matrix:
## a|a a|b b|a b|b
## a|a -0.5951317 0.3393103 0.2558215 0.0000000
## a|b 4.6486390 -4.7698708 0.0000000 0.1212318
## b|a 2.2086339 0.0000000 -4.5275279 2.3188940
## b|b 0.0000000 0.4113254 0.3105573 -0.7218827
##
## Model fit:
## log-likelihood
## independent -172.8745
## dependent -150.4001
##
## Hypothesis test result:
## likelihood-ratio: 44.94869
## p-value: 4.074835e-09
##
## Model fitting method used was fitMk
```

```
fitPagel(tree,x,y,method="ace")
```

```
##
## Pagel's binary character correlation test:
##
## Indepedent model rate matrix:
## a|a a|b b|a b|b
## a|a -0.8725883 0.6513844 0.2212039 0.0000000
## a|b 0.3220944 -0.5432984 0.0000000 0.2212039
## b|a 0.6189141 0.0000000 -1.2702985 0.6513844
## b|b 0.0000000 0.6189141 0.3220944 -0.9410086
##
## Dependent model rate matrix:
## a|a a|b b|a b|b
## a|a -0.595134 0.3393112 0.2558228 0.0000000
## a|b 4.648654 -4.7698873 0.0000000 0.1212328
## b|a 2.208634 0.0000000 -4.5275253 2.3188912
## b|b 0.000000 0.4113245 0.3105577 -0.7218823
##
## Model fit:
## log-likelihood
## independent -171.4882
## dependent -149.0138
##
## Hypothesis test result:
## likelihood-ratio: 44.94869
## p-value: 4.074835e-09
##
## Model fitting method used was ace
```

```
fitPagel(tree,x,y,method="fitDiscrete")
```

```
## Loading required package: geiger
## Loading required package: parallel
```

```
## Warning in fitDiscrete(tree, xy, model = iQ): Parameter estimates appear at bounds:
## q14
## q23
## q32
## q41
```

```
## Warning in fitDiscrete(tree, xy, model = dQ): Parameter estimates appear at bounds:
## q14
## q23
## q32
## q41
```

```
##
## Pagel's binary character correlation test:
##
## Indepedent model rate matrix:
## a|a a|b b|a b|b
## a|a -0.8815200 0.6639046 0.2176153 0.0000000
## a|b 0.3054217 -0.5230370 0.0000000 0.2176153
## b|a 0.6199298 0.0000000 -1.2838344 0.6639046
## b|b 0.0000000 0.6199298 0.3054217 -0.9253515
##
## Dependent model rate matrix:
## a|a a|b b|a b|b
## a|a -0.5943124 0.3350400 0.2592725 0.00000000
## a|b 4.0225133 -4.0854456 0.0000000 0.06293222
## b|a 2.0872453 0.0000000 -4.5241365 2.43689116
## b|b 0.0000000 0.3640895 0.2807990 -0.64488849
##
## Model fit:
## log-likelihood
## independent -171.5709
## dependent -149.5117
##
## Hypothesis test result:
## likelihood-ratio: 44.11832
## p-value: 6.06277e-09
##
## Model fitting method used was fitDiscrete
```

That's it.

Dear Liam,

ReplyDeleteI try to use fitPagel to check for correlation between two binary traits. However, I always get the following message when I'm using my data with fitMk method:

"model does not have the right number of columns"

Could you explain to me what this message exactly means?

Best

Marcin

Hi Marcin.

DeleteThis is a known bug in the CRAN version. I think it should be fixed if you update phytools from GitHub. To do that start a fresh R session & run:

library(devtools) ## must have devtools installed

install_github("liamrevell/phytools")

Let us know if that takes care of the issue.

- Liam

Thanks a lot. Now everything works perfectly.

DeleteBest

Marcin

Dear Liam,

ReplyDeleteIs it possible to select a substitution model for one variable and other model for the other? (say ER for X and ARD for Y)

Cheers,

Guilherme