Earlier today I described a new method for testing hypotheses about heterogeneity in the rate (or process) of discrete character evolution across a tree.

Although mathematically quite different, this methods owes its intellectual provenance to an article by Brian O'Meara & colleagues published nearly a dozen years ago in 2006. The tactic here is basically the same, but applied to discrete rather than continuous traits.

I just
pushed
another small update that modifies the `logLik`

methods of the `"fitMk"`

and `"fitmultiMk"`

object classes to facilitate comparison of models in which the
rate (or process) is either homogeneous (`"fitMk"`

) or non-homogeneous
(`"fitmultiMk"`

) across the edges of the tree.

In ensuring too that full credit goes where it is due, I should note that for computing the
likelihood I adapted Felsenstein's
pruning algorithm
as implemented in Emmanuel Paradis's *ape*
package.

Below, I'll examine the type I error of the method when used in hypothesis testing against a homogeneous rate/process model.

First, we need to simulate our 'regimes' on the trees. These could be specified arbitrarily, but
in this case I will just simulate a binary character on a set of pure-birth phylogenies using
`sim.history`

:

```
library(phytools)
packageVersion("phytools")
```

```
## [1] '0.6.51'
```

```
trees<-pbtree(n=100,scale=1,nsim=200)
Q<-matrix(c(-0.5,0.5,0.5,-0.5),2,2)
rownames(Q)<-colnames(Q)<-letters[1:2]
trees<-lapply(trees,sim.history,Q=Q,message=FALSE)
class(trees)<-c("multiSimmap","multiPhylo")
```

I generated 200 trees. Let's plot the first 100 of these:

```
par(mfrow=c(10,10))
nulo<-sapply(trees[1:100],plot,colors=setNames(c("blue","red"),
letters[1:2]),ftype="off",lwd=1)
```

Next, we can simulate a discrete character on each of these trees, but in which the transition process is unrelated to the mapped regimes - that is, in which the null hypothesis of no difference in rate or process exists:

```
Q<-matrix(c(-1,1,1,-1),2,2)
rownames(Q)<-colnames(Q)<-0:1
X<-lapply(trees,function(x,Q) getStates(sim.history(x,Q,
message=FALSE),"tips"),Q=Q)
```

Now let's fit our two models to each data vector & tree:

```
fits.single<-mapply(fitMk,tree=trees,x=X,MoreArgs=list(model="ER"),
SIMPLIFY=FALSE)
fits.multi<-mapply(fitmultiMk,tree=trees,x=X,MoreArgs=list(model="ER"),
SIMPLIFY=FALSE)
```

We can look at a single fitted model of each type:

```
fits.single[[1]]
```

```
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## 0 1
## 0 -0.912078 0.912078
## 1 0.912078 -0.912078
##
## Fitted (or set) value of pi:
## 0 1
## 0.5 0.5
##
## Log-likelihood: -41.127165
##
## Optimization method used was "nlminb"
```

```
fits.multi[[1]]
```

```
## Object of class "fitmultiMk".
##
## Fitted value of Q[a]:
## 0 1
## 0 -0.888472 0.888472
## 1 0.888472 -0.888472
##
## Fitted value of Q[b]:
## 0 1
## 0 -1.032995 1.032995
## 1 1.032995 -1.032995
##
## Fitted (or set) value of pi:
## 0 1
## 0.5 0.5
##
## Log-likelihood: -41.116092
##
## Optimization method used was "nlminb"
```

We can also conduct a likelihood-ratio test on each pair of fitted models. I will do that using
`lmtest::lrtest`

as follows:

```
library(lmtest)
suppressWarnings(LR.test<-mapply(lrtest,fits.single,fits.multi,
SIMPLIFY=FALSE))
## for example
LR.test[[1]]
```

```
## Likelihood ratio test
##
## Model 1: dots[[1L]][[1L]]
## Model 2: dots[[2L]][[1L]]
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 1 -41.127
## 2 2 -41.116 1 0.0221 0.8817
```

Let's pull out the P-values from these tests & plot them:

```
P<-sapply(LR.test,function(x) x[["Pr(>Chisq)"]][2])
obj<-hist(P,20,plot=FALSE)
plot(obj$mids,obj$counts/sum(obj$counts),type="h",
lwd=18,col=make.transparent("blue",0.4),lend=1,
xlab="P-value",ylab="relative frequency",ylim=c(0,0.2))
title(main="P-values for LR-test for data simulated under the null")
abline(h=0.05,col=make.transparent("red",0.5),lwd=1,
lty="dotted")
text(0.96,0.05,"0.05",pos=3)
```

Neat. This is close to what we'd hope for.

## No comments:

## Post a Comment