Tuesday, December 5, 2017

Type I error rates for variable-rate/process Mk discrete character evolution model

Earlier today I described a new method for testing hypotheses about heterogeneity in the rate (or process) of discrete character evolution across a tree.

Although mathematically quite different, this methods owes its intellectual provenance to an article by Brian O'Meara & colleagues published nearly a dozen years ago in 2006. The tactic here is basically the same, but applied to discrete rather than continuous traits.

I just pushed another small update that modifies the logLik methods of the "fitMk" and "fitmultiMk" object classes to facilitate comparison of models in which the rate (or process) is either homogeneous ("fitMk") or non-homogeneous ("fitmultiMk") across the edges of the tree.

In ensuring too that full credit goes where it is due, I should note that for computing the likelihood I adapted Felsenstein's pruning algorithm as implemented in Emmanuel Paradis's ape package.

Below, I'll examine the type I error of the method when used in hypothesis testing against a homogeneous rate/process model.

First, we need to simulate our 'regimes' on the trees. These could be specified arbitrarily, but in this case I will just simulate a binary character on a set of pure-birth phylogenies using sim.history:

library(phytools)
packageVersion("phytools")
## [1] '0.6.51'
trees<-pbtree(n=100,scale=1,nsim=200)
Q<-matrix(c(-0.5,0.5,0.5,-0.5),2,2)
rownames(Q)<-colnames(Q)<-letters[1:2]
trees<-lapply(trees,sim.history,Q=Q,message=FALSE)
class(trees)<-c("multiSimmap","multiPhylo")

I generated 200 trees. Let's plot the first 100 of these:

par(mfrow=c(10,10))
nulo<-sapply(trees[1:100],plot,colors=setNames(c("blue","red"),
    letters[1:2]),ftype="off",lwd=1)

plot of chunk unnamed-chunk-2

Next, we can simulate a discrete character on each of these trees, but in which the transition process is unrelated to the mapped regimes - that is, in which the null hypothesis of no difference in rate or process exists:

Q<-matrix(c(-1,1,1,-1),2,2)
rownames(Q)<-colnames(Q)<-0:1
X<-lapply(trees,function(x,Q) getStates(sim.history(x,Q,
    message=FALSE),"tips"),Q=Q)

Now let's fit our two models to each data vector & tree:

fits.single<-mapply(fitMk,tree=trees,x=X,MoreArgs=list(model="ER"),
    SIMPLIFY=FALSE)
fits.multi<-mapply(fitmultiMk,tree=trees,x=X,MoreArgs=list(model="ER"),
    SIMPLIFY=FALSE)

We can look at a single fitted model of each type:

fits.single[[1]]
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##           0         1
## 0 -0.912078  0.912078
## 1  0.912078 -0.912078
## 
## Fitted (or set) value of pi:
##   0   1 
## 0.5 0.5 
## 
## Log-likelihood: -41.127165 
## 
## Optimization method used was "nlminb"
fits.multi[[1]]
## Object of class "fitmultiMk".
## 
## Fitted value of Q[a]:
##           0         1
## 0 -0.888472  0.888472
## 1  0.888472 -0.888472
## 
## Fitted value of Q[b]:
##           0         1
## 0 -1.032995  1.032995
## 1  1.032995 -1.032995
## 
## Fitted (or set) value of pi:
##   0   1 
## 0.5 0.5 
## 
## Log-likelihood: -41.116092 
## 
## Optimization method used was "nlminb"

We can also conduct a likelihood-ratio test on each pair of fitted models. I will do that using lmtest::lrtest as follows:

library(lmtest)
suppressWarnings(LR.test<-mapply(lrtest,fits.single,fits.multi,
    SIMPLIFY=FALSE))
## for example
LR.test[[1]]
## Likelihood ratio test
## 
## Model 1: dots[[1L]][[1L]]
## Model 2: dots[[2L]][[1L]]
##   #Df  LogLik Df  Chisq Pr(>Chisq)
## 1   1 -41.127                     
## 2   2 -41.116  1 0.0221     0.8817

Let's pull out the P-values from these tests & plot them:

P<-sapply(LR.test,function(x) x[["Pr(>Chisq)"]][2])
obj<-hist(P,20,plot=FALSE)
plot(obj$mids,obj$counts/sum(obj$counts),type="h",
    lwd=18,col=make.transparent("blue",0.4),lend=1,
    xlab="P-value",ylab="relative frequency",ylim=c(0,0.2))
title(main="P-values for LR-test for data simulated under the null")
abline(h=0.05,col=make.transparent("red",0.5),lwd=1,
    lty="dotted")
text(0.96,0.05,"0.05",pos=3)

plot of chunk unnamed-chunk-7

Neat. This is close to what we'd hope for.

No comments:

Post a Comment