Tuesday, May 23, 2017

Type I error rates in ratebytree method to compare evolutionary rates between trees

Today I'm working on a more rigorous exploration of the statistical properties of the phytools function ratebytree.

As I have described in prior posts (e.g., 1, 2, , 4, 5), this is a method for comparing the evolutionary rate of a continuous character between trees. This method is a variant of Brian O'Meara et al.'s (2006) censored rates test. In fact, I don't really consider this to be a separate method at all - merely the application of the O'Meara et al. method to a particular problem.

In the following example, I simulate trees & data under a variety of different scenarios of tree size, but consistently under the null hypothesis of no difference in rate between trees. Then I fit the model to each set of simulated trees & datasets, extract the P-values, and compute the fraction of times that the hypothesis test resulted in a value of P that was less than or equal to 0.05.

For R enthusiasts, the script below also includes some useful examples of apply family functions, expression to put sub/superscripts in figure legends, and mtext to artfully position our subplot labels.

library(phytools)

set.seed(1)

nrep<-1000
t10<-replicate(nrep,pbtree(n=10,nsim=2),simplify=FALSE)
x10<-lapply(t10,function(x) lapply(x,fastBM))
t50<-replicate(nrep,pbtree(n=50,nsim=2),simplify=FALSE)
x50<-lapply(t50,function(x) lapply(x,fastBM))
foo<-function(n1,n2) c(pbtree(n=n1),pbtree(n=n2))
t10.t50<-replicate(nrep,foo(10,50),simplify=FALSE)
x10.x50<-lapply(t10.t50,function(x) lapply(x,fastBM))
t30<-replicate(nrep,pbtree(n=30,nsim=3),simplify=FALSE)
x30<-lapply(t30,function(x) lapply(x,fastBM))

typeIerror<-function(i,t,x,max){
    obj<-ratebytree(t,x)
    if(i==1) cat("|.") else cat(".")
    if(i==max) cat("|\n")
    if(i%%50==0) cat("\n ")
    flush.console()
    obj$P.chisq
}

par(mfrow=c(2,2))

P10<-mapply(typeIerror,1:nrep,t10,x10,MoreArgs=list(max=nrep))
## |..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................|
## 
## 
h10<-hist(P10,breaks=seq(0,1,by=0.05),col="grey",xlab="P-value",
    main="",freq=FALSE)
mtext(text=expression('a) N'[1]*'=N'[2]*'=10'),adj=0,line=1,
    cex=1)
abline(h=1,lwd=2,col="red",lty="dashed")

P50<-mapply(typeIerror,1:nrep,t50,x50,MoreArgs=list(max=nrep))
## |..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................|
## 
## 
h50<-hist(P50,breaks=seq(0,1,by=0.05),col="grey",main="",
    xlab="P-value",freq=FALSE)
mtext(text=expression('b) N'[1]*'=N'[2]*'=50'),adj=0,line=1,
    cex=1)
abline(h=1,lwd=2,col="red",lty="dashed")

P10.50<-mapply(typeIerror,1:nrep,t10.t50,x10.x50,MoreArgs=list(max=nrep))
## |..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................|
## 
## 
h10.50<-hist(P10.50,breaks=seq(0,1,by=0.05),col="grey",main="",freq=FALSE,
    xlab="P-value")
mtext(text=expression('c) N'[1]*'=10, N'[2]*'=50'),adj=0,line=1,
    cex=1)
abline(h=1,lwd=2,col="red",lty="dashed")

P30<-mapply(typeIerror,1:nrep,t30,x30,MoreArgs=list(max=nrep))
## |..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................|
## 
## 
h30<-hist(P30,breaks=seq(0,1,by=0.05),col="grey",freq=FALSE,main="",
    xlab="P-value")
mtext(text=expression('d) N'[1]*'=N'[2]*'=N'[3]*'=30'),
    adj=0,line=1,cex=1)
abline(h=1,lwd=2,col="red",lty="dashed")

plot of chunk unnamed-chunk-1

typeI<-matrix(sapply(list(P10,P50,P10.50,P30),function(x,alpha)
    mean(x<=alpha),alpha=0.05),4,1,dimnames=list(
    c("N1=N2=10","N1=N2=50","N1=10,N2=50","N1=N2=N3=30"),
    "type I error"))
typeI
##             type I error
## N1=N2=10           0.065
## N1=N2=50           0.052
## N1=10,N2=50        0.056
## N1=N2=N3=30        0.063

If the method is working properly, under the null we should see a relatively uniform density on the interval [0,1]. We should also see a type I error rate close to the nominal level. In this case, our type I error rate is slightly elevated for the smallest tree sizes - but for even our relatively modestly sized trees of panels b) & c) there is relatively little evidence for elevated type I error over the nominal level in this test.

That's it.

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.