## Tuesday, May 23, 2017

### Type I error rates in `ratebytree` method to compare evolutionary rates between trees

Today I'm working on a more rigorous exploration of the statistical properties of the phytools function `ratebytree`.

As I have described in prior posts (e.g., 1, 2, , 4, 5), this is a method for comparing the evolutionary rate of a continuous character between trees. This method is a variant of Brian O'Meara et al.'s (2006) censored rates test. In fact, I don't really consider this to be a separate method at all - merely the application of the O'Meara et al. method to a particular problem.

In the following example, I simulate trees & data under a variety of different scenarios of tree size, but consistently under the null hypothesis of no difference in rate between trees. Then I fit the model to each set of simulated trees & datasets, extract the P-values, and compute the fraction of times that the hypothesis test resulted in a value of P that was less than or equal to 0.05.

For R enthusiasts, the script below also includes some useful examples of `apply` family functions, `expression` to put sub/superscripts in figure legends, and `mtext` to artfully position our subplot labels.

``````library(phytools)

set.seed(1)

nrep<-1000
t10<-replicate(nrep,pbtree(n=10,nsim=2),simplify=FALSE)
x10<-lapply(t10,function(x) lapply(x,fastBM))
t50<-replicate(nrep,pbtree(n=50,nsim=2),simplify=FALSE)
x50<-lapply(t50,function(x) lapply(x,fastBM))
foo<-function(n1,n2) c(pbtree(n=n1),pbtree(n=n2))
t10.t50<-replicate(nrep,foo(10,50),simplify=FALSE)
x10.x50<-lapply(t10.t50,function(x) lapply(x,fastBM))
t30<-replicate(nrep,pbtree(n=30,nsim=3),simplify=FALSE)
x30<-lapply(t30,function(x) lapply(x,fastBM))

typeIerror<-function(i,t,x,max){
obj<-ratebytree(t,x)
if(i==1) cat("|.") else cat(".")
if(i==max) cat("|\n")
if(i%%50==0) cat("\n ")
flush.console()
obj\$P.chisq
}

par(mfrow=c(2,2))

P10<-mapply(typeIerror,1:nrep,t10,x10,MoreArgs=list(max=nrep))
``````
``````## |..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................|
##
##
``````
``````h10<-hist(P10,breaks=seq(0,1,by=0.05),col="grey",xlab="P-value",
main="",freq=FALSE)
cex=1)
abline(h=1,lwd=2,col="red",lty="dashed")

P50<-mapply(typeIerror,1:nrep,t50,x50,MoreArgs=list(max=nrep))
``````
``````## |..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................|
##
##
``````
``````h50<-hist(P50,breaks=seq(0,1,by=0.05),col="grey",main="",
xlab="P-value",freq=FALSE)
cex=1)
abline(h=1,lwd=2,col="red",lty="dashed")

P10.50<-mapply(typeIerror,1:nrep,t10.t50,x10.x50,MoreArgs=list(max=nrep))
``````
``````## |..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................|
##
##
``````
``````h10.50<-hist(P10.50,breaks=seq(0,1,by=0.05),col="grey",main="",freq=FALSE,
xlab="P-value")
cex=1)
abline(h=1,lwd=2,col="red",lty="dashed")

P30<-mapply(typeIerror,1:nrep,t30,x30,MoreArgs=list(max=nrep))
``````
``````## |..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................|
##
##
``````
``````h30<-hist(P30,breaks=seq(0,1,by=0.05),col="grey",freq=FALSE,main="",
xlab="P-value")
mtext(text=expression('d) N'[1]*'=N'[2]*'=N'[3]*'=30'),
abline(h=1,lwd=2,col="red",lty="dashed")
``````

``````typeI<-matrix(sapply(list(P10,P50,P10.50,P30),function(x,alpha)
mean(x<=alpha),alpha=0.05),4,1,dimnames=list(
c("N1=N2=10","N1=N2=50","N1=10,N2=50","N1=N2=N3=30"),
"type I error"))
typeI
``````
``````##             type I error
## N1=N2=10           0.065
## N1=N2=50           0.052
## N1=10,N2=50        0.056
## N1=N2=N3=30        0.063
``````

If the method is working properly, under the null we should see a relatively uniform density on the interval [0,1]. We should also see a type I error rate close to the nominal level. In this case, our type I error rate is slightly elevated for the smallest tree sizes - but for even our relatively modestly sized trees of panels b) & c) there is relatively little evidence for elevated type I error over the nominal level in this test.

That's it.