I recently added to the function `ratebytree`

methods to compare
the rates of diversification (speciation, extinction, or both) between
phylogenies
(1,
2,
3,
4,
5).

In the following I'm going to look at parameter estimation & statistical
power of the method. For this simulation I generated pairs of phylogenies in
which I held λ_{1} constant at 0.052, μ constant at 0.013, the
total time *T* constant at 100 (this results in an expected number of
lineages equal to 100 in tree 1), and then varied λ_{2} to be
equal to 0.045, 0.052, 0.059, 0.068, and 0.075 to result in an expected number
of extant lineages of 50, 100, 200, 500, and 1000 (respectively) in tree 2. Finally,
I fit the generating model (a variable speciation model) to each pair of trees.

Note that I repeated any simulation that resulted in a tree containing fewer than five taxa.

Here is what that looked like:

```
library(phytools)
set.seed(1)
## general
N<-c(50,100,200,500,1000)
T<-100
nd<-(log(N[2])-log(2))/T
b1<-nd/(1-0.25)
d<-b1-nd
b2<-(log(N)-log(2))/T+d
nsim<-500
## our birth-death simulator
foo<-function(b1,b2,...){
t1<-NULL
while(is.null(t1)||Ntip(t1)<5) t1<-pbtree(b=b1,...,quiet=T)
t2<-NULL
while(is.null(t2)||Ntip(t2)<5) t2<-pbtree(b=b2,...,quiet=T)
obj<-c(t1,t2)
class(obj)<-"multiPhylo"
obj
}
## some objects to use to store our results
FITS<-list()
RESULTS<-list()
BD<-matrix(NA,length(b2),7,dimnames=list(paste("sim",1:length(b2)),
c("b[1]","sd(b[1])","b[2]","sd(b[2])","d","sd(d)","power")))
for(i in 1:length(b2)){
t<-replicate(nsim,foo(b1=b1,b2=b2[i],d=d,t=T,extant.only=TRUE),
simplify=FALSE)
fits<-lapply(t,ratebytree,model="equal-extinction")
RESULTS[[i]]<-cbind(
sapply(fits,function(x) x$multi.rate.model$rates[1]),
sapply(fits,function(x) x$multi.rate.model$rates[2]),
sapply(fits,function(x) x$multi.rate.model$rates[3]),
sapply(fits,function(x) x$likelihood.ratio),
sapply(fits,function(x) x$P.chisq))
colnames(RESULTS[[i]])<-c("b[1]","b[2]","d","LR","P")
FITS[[i]]<-fits
BD[i,c(1,3,5)]<-colMeans(RESULTS[[i]][,1:3])
BD[i,c(2,4,6)]<-apply(RESULTS[[i]][,1:3],2,sd)
BD[i,7]<-mean(RESULTS[[i]][,5]<=0.05)
}
```

```
## There were 50 or more warnings (use warnings() to see the first 50)
```

```
## RESULTS
print(round(BD,5))
```

```
## b[1] sd(b[1]) b[2] sd(b[2]) d sd(d) power
## sim 1 0.05161 0.01051 0.04537 0.01217 0.01537 0.01316 0.140
## sim 2 0.05216 0.00968 0.05162 0.01075 0.01554 0.01375 0.058
## sim 3 0.05081 0.01040 0.05853 0.00838 0.01483 0.01201 0.178
## sim 4 0.05100 0.01073 0.06842 0.00786 0.01518 0.01162 0.664
## sim 5 0.05241 0.00887 0.07525 0.00618 0.01469 0.00945 0.858
```

```
par(mfrow=c(2,1))
library(RColorBrewer)
cols<-brewer.pal(3,"Accent")
xlim<-c(0.04,0.08)
ylim<-c(0,0.09)
plot(b2,BD[,3],xlim=xlim,ylim=ylim,pch=21,bg=cols[1],cex=1.25,
xlab=expression(lambda[2]),ylab="estimated")
lines(b2,b2,lwd=2,col=make.transparent(cols[1],0.5))
for(i in 1:length(b2))
lines(rep(b2[i],2),rep(BD[i,3])+c(BD[i,4],-BD[i,4]),lwd=5,
col=make.transparent(cols[1],0.5))
points(b2,BD[,3],pch=21,cex=1.25)
points(b2,BD[,1],pch=21,bg=cols[2],cex=1.25)
lines(b2,rep(b1,length(b2)),lwd=2,col=make.transparent(cols[2],0.5))
for(i in 1:length(b2))
lines(rep(b2[i],2),rep(BD[i,1])+c(BD[i,2],-BD[i,2]),lwd=5,
col=make.transparent(cols[2],0.5))
points(b2,BD[,1],pch=21,cex=1.25)
points(b2,BD[,5],pch=21,bg=cols[3],cex=1.25)
lines(b2,rep(d,length(b2)),lwd=2,col=make.transparent(cols[3],0.5))
for(i in 1:length(b2))
lines(rep(b2[i],2),rep(BD[i,5])+c(BD[i,6],-BD[i,6]),lwd=5,
col=make.transparent(cols[3],0.5))
points(b2,BD[,5],pch=21,cex=1.25)
legend(x=par()$usr[1],y=par()$usr[4],
legend=c(expression(lambda[1]),expression(lambda[2]),
expression(mu)),pch=21,pt.bg=cols[c(2,1,3)],pt.cex=1.25,bty="n")
mtext(paste(letters[1],")",sep=""),adj=-0.1,line=1)
plot(b2,BD[,7],xlim=xlim,ylim=c(0,1),pch=21,bg="grey",type="b",cex=1.25,
xlab=expression(lambda[2]),ylab="power (or Type I error)")
abline(h=0.05,lty="dashed",col="grey")
text(x=0.075,y=0.04,"0.05",pos=3,cex=0.8)
mtext(paste(letters[2],")",sep=""),adj=-0.1,line=1)
```

In panel a), the generating values of the parameters are denoted by the various color diagonal or horizontal lines, while the mean estimates (and SD across simulations) are given by the points and vertical lines, respectively.

In panel b), the power (or type I error in the case of λ_{1} =
λ_{2}) are given by the points. The horizontal dotted line is
at the nominal α level (α = 0.05).

Neat.

## No comments:

## Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.