I just added the new function I posted last night to the phytools package.

This
function
takes as input two or more trees in an object of class
`"multiPhylo"`

and two or more continuous trait vectors in a list
and proceeds to fit two different models: one in which the rate is constant
across all of the trees; and a second, more parameter rich, model in which
the rates are permitted to be different among trees. The method then also
compares these two fitted models using a likelihood-ratio test.

As I mentioned
yesterday,
this method is basically equivalent to the *censored* approach described
by O'Meara et al.
(2006) and
should be closely related to the method of Adams
(2012).

In the following, I thought I would explore the statistical properties and power of the method for the case of two trees each with 50 taxa.

First type I error:

```
## we'll use the following custom function
typeIerror<-function(i,N1,N2,max){
t1<-pbtree(n=N1)
t2<-pbtree(n=N2)
x<-fastBM(t1)
y<-fastBM(t2)
obj<-ratebytree(c(t1,t2),list(x,y))
if(i==1) cat("|.") else cat(".")
if(i==max) cat("|")
if(i%%50==0) cat("\n ")
flush.console()
obj$P.chisq
}
library(phytools)
packageVersion("phytools")
```

```
## [1] '0.5.72'
```

```
nrep<-500
P<-sapply(1:nrep,typeIerror,N1=50,N2=50,max=nrep)
```

```
## |..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................
## ..................................................|
##
```

```
h<-hist(P,breaks=seq(0,1,by=0.1),col="grey",
main="P (under null hypothesis)",freq=FALSE)
abline(h=1,lwd=2,col="red",lty="dashed")
```

Note that this density should be more or less uniform on the interval 0,1 (which is just about what we see), and the type I error rate should be around 0.05:

```
mean(P<=0.05)
```

```
## [1] 0.046
```

Now let's investigate power for various differences in rate between trees
(σ^{2}[2]/σ^{2}[1]):

```
## we'll use the custom function as follows:
power<-function(sig2,N1,N2){
t1<-pbtree(n=N1)
t2<-pbtree(n=N2)
x<-fastBM(t1)
y<-fastBM(t2,sig2=sig2)
obj<-ratebytree(c(t1,t2),list(x,y))
obj$P.chisq
}
foo<-function(sig2,n){
p<-setNames(replicate(n,power(sig2,50,50)),
paste("rep",1:n))
cat(paste("Done sig2[2]/sig2[1] =",sig2,"\n"))
flush.console()
p
}
sig2<-seq(1,4,by=0.2)
P.power<-sapply(sig2,foo,n=100)
```

```
## Done sig2[2]/sig2[1] = 1
## Done sig2[2]/sig2[1] = 1.2
## Done sig2[2]/sig2[1] = 1.4
## Done sig2[2]/sig2[1] = 1.6
## Done sig2[2]/sig2[1] = 1.8
## Done sig2[2]/sig2[1] = 2
## Done sig2[2]/sig2[1] = 2.2
## Done sig2[2]/sig2[1] = 2.4
## Done sig2[2]/sig2[1] = 2.6
## Done sig2[2]/sig2[1] = 2.8
## Done sig2[2]/sig2[1] = 3
## Done sig2[2]/sig2[1] = 3.2
## Done sig2[2]/sig2[1] = 3.4
## Done sig2[2]/sig2[1] = 3.6
## Done sig2[2]/sig2[1] = 3.8
## Done sig2[2]/sig2[1] = 4
```

```
colnames(P.power)<-paste("s2[2]=",sig2,sep="")
plot(sig2,apply(P.power,2,function(x) mean(x<=0.05)),type="b",
xlab=expression(sigma[2]^2/sigma[1]^2),ylab="Power",
ylim=c(0,1),main="Power to detect a difference in rate",
lwd=2)
abline(h=0.05,lty="dashed")
abline(h=1,lty="dashed")
text(x=4,y=0.05,"0.05",pos=3)
text(x=1,y=1,"1.00",pos=1)
```

Again, we hope that the power to detect a difference in rate should increase
with σ_{2}^{2}/σ_{1}^{2} - that
is, the difference in rate between trees. Once again, this is more or
less what we see.

Cool.

## No comments:

## Post a Comment