Today I'm working on a more rigorous exploration of the statistical
properties of the phytools function ratebytree
.
As I have described in prior posts (e.g.,
1,
2,
,
4,
5),
this is a method for comparing the evolutionary rate of a continuous character
between trees. This method is a variant of Brian O'Meara et al.'s (2006)
censored rates test. In fact, I don't really consider this to be a
separate method at all - merely the application of the O'Meara et al. method
to a particular problem.
In the following example, I simulate trees & data under a variety of different
scenarios of tree size, but consistently under the null hypothesis of no
difference in rate between trees. Then I fit the model to each set of simulated
trees & datasets, extract the P-values, and compute the fraction of times that
the hypothesis test resulted in a value of P that was less than or equal to
0.05.
For R enthusiasts, the script below also includes some useful examples of
apply
family functions, expression
to put
sub/superscripts in figure legends, and mtext
to artfully
position our subplot labels.
library(phytools)
set.seed(1)
nrep<-1000
t10<-replicate(nrep,pbtree(n=10,nsim=2),simplify=FALSE)
x10<-lapply(t10,function(x) lapply(x,fastBM))
t50<-replicate(nrep,pbtree(n=50,nsim=2),simplify=FALSE)
x50<-lapply(t50,function(x) lapply(x,fastBM))
foo<-function(n1,n2) c(pbtree(n=n1),pbtree(n=n2))
t10.t50<-replicate(nrep,foo(10,50),simplify=FALSE)
x10.x50<-lapply(t10.t50,function(x) lapply(x,fastBM))
t30<-replicate(nrep,pbtree(n=30,nsim=3),simplify=FALSE)
x30<-lapply(t30,function(x) lapply(x,fastBM))
typeIerror<-function(i,t,x,max){
obj<-ratebytree(t,x)
if(i==1) cat("|.") else cat(".")
if(i==max) cat("|\n")
if(i%%50==0) cat("\n ")
flush.console()
obj$P.chisq
}
par(mfrow=c(2,2))
P10<-mapply(typeIerror,1:nrep,t10,x10,MoreArgs=list(max=nrep))
## |..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................|
##
##
h10<-hist(P10,breaks=seq(0,1,by=0.05),col="grey",xlab="P-value",
main="",freq=FALSE)
mtext(text=expression('a) N'[1]*'=N'[2]*'=10'),adj=0,line=1,
cex=1)
abline(h=1,lwd=2,col="red",lty="dashed")
P50<-mapply(typeIerror,1:nrep,t50,x50,MoreArgs=list(max=nrep))
## |..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................|
##
##
h50<-hist(P50,breaks=seq(0,1,by=0.05),col="grey",main="",
xlab="P-value",freq=FALSE)
mtext(text=expression('b) N'[1]*'=N'[2]*'=50'),adj=0,line=1,
cex=1)
abline(h=1,lwd=2,col="red",lty="dashed")
P10.50<-mapply(typeIerror,1:nrep,t10.t50,x10.x50,MoreArgs=list(max=nrep))
## |..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................|
##
##
h10.50<-hist(P10.50,breaks=seq(0,1,by=0.05),col="grey",main="",freq=FALSE,
xlab="P-value")
mtext(text=expression('c) N'[1]*'=10, N'[2]*'=50'),adj=0,line=1,
cex=1)
abline(h=1,lwd=2,col="red",lty="dashed")
P30<-mapply(typeIerror,1:nrep,t30,x30,MoreArgs=list(max=nrep))
## |..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................|
##
##
h30<-hist(P30,breaks=seq(0,1,by=0.05),col="grey",freq=FALSE,main="",
xlab="P-value")
mtext(text=expression('d) N'[1]*'=N'[2]*'=N'[3]*'=30'),
adj=0,line=1,cex=1)
abline(h=1,lwd=2,col="red",lty="dashed")
typeI<-matrix(sapply(list(P10,P50,P10.50,P30),function(x,alpha)
mean(x<=alpha),alpha=0.05),4,1,dimnames=list(
c("N1=N2=10","N1=N2=50","N1=10,N2=50","N1=N2=N3=30"),
"type I error"))
typeI
## type I error
## N1=N2=10 0.065
## N1=N2=50 0.052
## N1=10,N2=50 0.056
## N1=N2=N3=30 0.063
If the method is working properly, under the null we should see a
relatively uniform density on the interval [0,1]. We should also see a
type I error rate close to the nominal level. In this case, our type I error
rate is slightly elevated for the smallest tree sizes - but for even our
relatively modestly sized trees of panels b) & c) there is relatively
little evidence for elevated type I error over the nominal level in this
test.
That's it.