Thursday, February 2, 2017

Type I error & power of method to compare rates among trees

I just added the new function I posted last night to the phytools package.

This function takes as input two or more trees in an object of class "multiPhylo" and two or more continuous trait vectors in a list and proceeds to fit two different models: one in which the rate is constant across all of the trees; and a second, more parameter rich, model in which the rates are permitted to be different among trees. The method then also compares these two fitted models using a likelihood-ratio test.

As I mentioned yesterday, this method is basically equivalent to the censored approach described by O'Meara et al. (2006) and should be closely related to the method of Adams (2012).

In the following, I thought I would explore the statistical properties and power of the method for the case of two trees each with 50 taxa.

First type I error:

## we'll use the following custom function
typeIerror<-function(i,N1,N2,max){
    t1<-pbtree(n=N1)
    t2<-pbtree(n=N2)
    x<-fastBM(t1)
    y<-fastBM(t2)
    obj<-ratebytree(c(t1,t2),list(x,y))
    if(i==1) cat("|.") else cat(".")
    if(i==max) cat("|")
    if(i%%50==0) cat("\n ")
    flush.console()
    obj$P.chisq
}
library(phytools)
packageVersion("phytools")
## [1] '0.5.72'
nrep<-500
P<-sapply(1:nrep,typeIerror,N1=50,N2=50,max=nrep)
## |..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................|
## 
h<-hist(P,breaks=seq(0,1,by=0.1),col="grey",
    main="P (under null hypothesis)",freq=FALSE)
abline(h=1,lwd=2,col="red",lty="dashed")

plot of chunk unnamed-chunk-1

Note that this density should be more or less uniform on the interval 0,1 (which is just about what we see), and the type I error rate should be around 0.05:

mean(P<=0.05)
## [1] 0.046

Now let's investigate power for various differences in rate between trees (σ2[2]/σ2[1]):

## we'll use the custom function as follows:
power<-function(sig2,N1,N2){
    t1<-pbtree(n=N1)
    t2<-pbtree(n=N2)
    x<-fastBM(t1)
    y<-fastBM(t2,sig2=sig2)
    obj<-ratebytree(c(t1,t2),list(x,y))
    obj$P.chisq
}
foo<-function(sig2,n){
    p<-setNames(replicate(n,power(sig2,50,50)),
        paste("rep",1:n))
    cat(paste("Done sig2[2]/sig2[1] =",sig2,"\n"))
    flush.console()
    p
}
sig2<-seq(1,4,by=0.2)
P.power<-sapply(sig2,foo,n=100)
## Done sig2[2]/sig2[1] = 1 
## Done sig2[2]/sig2[1] = 1.2 
## Done sig2[2]/sig2[1] = 1.4 
## Done sig2[2]/sig2[1] = 1.6 
## Done sig2[2]/sig2[1] = 1.8 
## Done sig2[2]/sig2[1] = 2 
## Done sig2[2]/sig2[1] = 2.2 
## Done sig2[2]/sig2[1] = 2.4 
## Done sig2[2]/sig2[1] = 2.6 
## Done sig2[2]/sig2[1] = 2.8 
## Done sig2[2]/sig2[1] = 3 
## Done sig2[2]/sig2[1] = 3.2 
## Done sig2[2]/sig2[1] = 3.4 
## Done sig2[2]/sig2[1] = 3.6 
## Done sig2[2]/sig2[1] = 3.8 
## Done sig2[2]/sig2[1] = 4
colnames(P.power)<-paste("s2[2]=",sig2,sep="")
plot(sig2,apply(P.power,2,function(x) mean(x<=0.05)),type="b",
    xlab=expression(sigma[2]^2/sigma[1]^2),ylab="Power",
    ylim=c(0,1),main="Power to detect a difference in rate",
    lwd=2)
abline(h=0.05,lty="dashed")
abline(h=1,lty="dashed")
text(x=4,y=0.05,"0.05",pos=3)
text(x=1,y=1,"1.00",pos=1)

plot of chunk unnamed-chunk-3

Again, we hope that the power to detect a difference in rate should increase with σ2212 - that is, the difference in rate between trees. Once again, this is more or less what we see.

Cool.

No comments:

Post a Comment