^{2}-distribution.

We could also compare the likelihood ratio to likelihood ratios obtained by simulating under the null hypothesis. (In fact, I believe that the authors suggested this in their original paper.) I have now implemented this test in brownie.lite(). I am also presently conducting a mini-experiment to compare the type I error rates of each method. Since the likelihood ratio is only asymptotically Χ

^{2}-distributed for large

*n*, we might expect that the simulation test would be more conservative than the Χ

^{2}test. So far I have not found a lot of evidence to suggest that it is for

*n*=10 and

*n*=20.

I'll also confess to a small fudge in the implementation of this method. Occasionally (i.e., about 1% of the time or so), brownie.lite() fails to converge during optimization. Rather than fix this, to avoid this problem when trying to compute the P-value by simulation, I just discarded any simulated datasets that were troublesome to optimize. I don't see how this would result in any bias, but I don't know. In the near future, I will fix this problem in brownie.lite() (and hopefully clean up the function in general, as it is pretty messy).

If you'd like to participate in my mini experiment, I encourage you to download the new version of brownie.lite() and try and run the following code (hopefully increasing nrep and varying nsp. Please post the results.

require(phytools)

require(geiger)

source("brownie.lite.R") # load new version of brownie.lite()

typeI.x2<-0; typeI.sim<-0

nrep<-100; nsp=20

for(i in 1:nrep){

# simulate tree

tree<-drop.tip(birthdeath.tree(b=1,d=0,taxa.stop=(nsp+1)), as.character(nsp+1))

# rescale tree

tree$edge.length<-tree$edge.length/mean(diag(vcv(tree)))

# simulate rate regime

x<-sim.char(tree,model.matrix=list(matrix(c(-1,1,1,-1),2,2)), model="discrete")[,,1]

while(length(unique(x))==1)

x<-sim.char(tree,model.matrix=list(matrix(c(-1,1,1,-1), 2,2)),model="discrete")[,,1]

mtree<-make.simmap(tree,x)

# simulate data under the null

y<-fastBM(tree)

# fit the model & compute X^2 P-value

result.x2<-brownie.lite(mtree,y,test="chisq")

# fit the model & compute simulation P-value

result.sim<-brownie.lite(mtree,y,test="simulation")

# count the frequence of P<0.05

typeI.x2<-typeI.x2+(result.x2$P.chisq<0.05)/nrep

typeI.sim<-typeI.sim+(result.sim$P.sim<0.05)/nrep

# something to look at while this is running

# (should converge to 0.05)

print(paste("X^2 type I error:",typeI.x2*nrep/i, "; Simulation type I error:",typeI.sim*nrep/i))

}

Not sure if you wanted to see the log or just summaries, but I did a couple:

ReplyDeletehttp://d.pr/P1p1

http://d.pr/D5ni

http://d.pr/Fh3L

Tried to do longer ones with more species but my poor laptop can't handle it... hope it helps!

nice

ReplyDelete@Rafa - Cool, can you describe how those plots were generated?

ReplyDeleteSure, I just created two vectors before the loop:

ReplyDelete> TIx2=NULL; TIsim=NULL

then inside the loop I filled the vectors with the same equation that was being printed during each run:

> TIx2[i]=typeI.x2*nrep/i

> TIsim[i]=typeI.sim*nrep/i

> print(paste(i,": X^2 type I error:",typeI.x2*nrep/i, "; Simulation type I error:",typeI.sim*nrep/i))

Then, the density plots:

> plot(density(TIx2),xlab="Type I error",main="nrep = 500, nsp = 20",xlim=c(-0.01,0.15),col="blue",lwd=2)

> lines(density(TIsim),col="red",lwd=2)

> abline(v=0.05,lty=2)

> legend("topright", legend=c("Xˆ2","Simulation"), lty=c(1,1), col=c("blue","red"))

I also loaded from source brownie.lite() from the link in this post, and make.simmap() from the previous post in your blog.

hope that's the actual statistic you were interested in. If there's any other particular simulation scenario I can help with, just let me know. cheers!

This comment has been removed by the author.

ReplyDeleteJust realized that the printed arguments are updated at each loop run to add the current run to the previous, and aren't independent values, so I guess my plots are pretty meaningless! Ack. Sorry.

ReplyDeleteIn that case, the last printed values for those were:

nrep<-100; nsp=20

"100 : X^2 type I error: 0.1 ; Simulation type I error: 0.05"

nrep<-500; nsp=20

"500 : X^2 type I error: 0.062 ; Simulation type I error: 0.064"

nrep<-100; nsp=50

[1] "100 : X^2 type I error: 0.06 ; Simulation type I error: 0.05"

Hope that's more helpful! Cheers!

Hi Liam,

ReplyDeleteDo you have a sense for how fast/slow the simulated null runs? Also, how many simulated points does one need to obtain a well-sampled simulated null?

Thanks!!

Dan.