I just added the new function I posted last night to the phytools package.
This
function
takes as input two or more trees in an object of class
"multiPhylo"
and two or more continuous trait vectors in a list
and proceeds to fit two different models: one in which the rate is constant
across all of the trees; and a second, more parameter rich, model in which
the rates are permitted to be different among trees. The method then also
compares these two fitted models using a likelihood-ratio test.
As I mentioned yesterday, this method is basically equivalent to the censored approach described by O'Meara et al. (2006) and should be closely related to the method of Adams (2012).
In the following, I thought I would explore the statistical properties and power of the method for the case of two trees each with 50 taxa.
First type I error:
## we'll use the following custom function
typeIerror<-function(i,N1,N2,max){
t1<-pbtree(n=N1)
t2<-pbtree(n=N2)
x<-fastBM(t1)
y<-fastBM(t2)
obj<-ratebytree(c(t1,t2),list(x,y))
if(i==1) cat("|.") else cat(".")
if(i==max) cat("|")
if(i%%50==0) cat("\n ")
flush.console()
obj$P.chisq
}
library(phytools)
packageVersion("phytools")
## [1] '0.5.72'
nrep<-500
P<-sapply(1:nrep,typeIerror,N1=50,N2=50,max=nrep)
## |..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................|
##
h<-hist(P,breaks=seq(0,1,by=0.1),col="grey",
main="P (under null hypothesis)",freq=FALSE)
abline(h=1,lwd=2,col="red",lty="dashed")
Note that this density should be more or less uniform on the interval 0,1 (which is just about what we see), and the type I error rate should be around 0.05:
mean(P<=0.05)
## [1] 0.046
Now let's investigate power for various differences in rate between trees (σ2[2]/σ2[1]):
## we'll use the custom function as follows:
power<-function(sig2,N1,N2){
t1<-pbtree(n=N1)
t2<-pbtree(n=N2)
x<-fastBM(t1)
y<-fastBM(t2,sig2=sig2)
obj<-ratebytree(c(t1,t2),list(x,y))
obj$P.chisq
}
foo<-function(sig2,n){
p<-setNames(replicate(n,power(sig2,50,50)),
paste("rep",1:n))
cat(paste("Done sig2[2]/sig2[1] =",sig2,"\n"))
flush.console()
p
}
sig2<-seq(1,4,by=0.2)
P.power<-sapply(sig2,foo,n=100)
## Done sig2[2]/sig2[1] = 1
## Done sig2[2]/sig2[1] = 1.2
## Done sig2[2]/sig2[1] = 1.4
## Done sig2[2]/sig2[1] = 1.6
## Done sig2[2]/sig2[1] = 1.8
## Done sig2[2]/sig2[1] = 2
## Done sig2[2]/sig2[1] = 2.2
## Done sig2[2]/sig2[1] = 2.4
## Done sig2[2]/sig2[1] = 2.6
## Done sig2[2]/sig2[1] = 2.8
## Done sig2[2]/sig2[1] = 3
## Done sig2[2]/sig2[1] = 3.2
## Done sig2[2]/sig2[1] = 3.4
## Done sig2[2]/sig2[1] = 3.6
## Done sig2[2]/sig2[1] = 3.8
## Done sig2[2]/sig2[1] = 4
colnames(P.power)<-paste("s2[2]=",sig2,sep="")
plot(sig2,apply(P.power,2,function(x) mean(x<=0.05)),type="b",
xlab=expression(sigma[2]^2/sigma[1]^2),ylab="Power",
ylim=c(0,1),main="Power to detect a difference in rate",
lwd=2)
abline(h=0.05,lty="dashed")
abline(h=1,lty="dashed")
text(x=4,y=0.05,"0.05",pos=3)
text(x=1,y=1,"1.00",pos=1)
Again, we hope that the power to detect a difference in rate should increase with σ22/σ12 - that is, the difference in rate between trees. Once again, this is more or less what we see.
Cool.
No comments:
Post a Comment
Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.