The MCCR test for Pybus & Harvey's (2000) γ statistic from incompletely sampled phylogenies used to be implemented in the no-longer-available-on-CRAN laser package of Dan Rabosky. The method is pretty simple, though, & since we use it in teaching I figured it would be easy enough to add to phytools.
Here is a function to do the test, as well as print
& plot
methods
for the resultant object:
mccr<-function(obj,rho=1,nsim=100,plot=TRUE){
N<-round(Ntip(obj$tree)/rho)
tt<-pbtree(n=N,nsim=nsim)
foo<-function(x,m) drop.tip(x,sample(x$tip.label,m))
tt<-lapply(tt,foo,m=N-Ntip(obj$tree))
g<-sapply(tt,function(x) ltt(x,plot=FALSE)$gamma)
P<-if(obj$gamma>median(g)) 2*mean(g>=obj$gamma) else 2*mean(g<=obj$gamma)
result<-list(gamma=obj$gamma,"P(two-tailed)"=P,null.gamma=g)
class(result)<-"mccr"
result
}
print.mccr<-function(x,digits=4,...){
cat("Object of class \"mccr\" consisting of:\n\n")
cat(paste("(1) A value for Pybus & Harvey's \"gamma\"",
" statistic of ",round(x$gamma,digits),".\n\n",sep=""))
cat(paste("(2) A two-tailed p-value from the MCCR test of ",
round(x$'P(two-tailed)',digits),".\n\n", sep = ""))
cat(paste("(3) A simulated null-distribution of gamma from ",
length(x$null.gamma)," simulations.\n\n",sep=""))
}
plot.mccr<-function(x,...){
hist(x$null.gamma,breaks=min(c(max(12,round(length(x$null.gamma)/10)),20)),
xlim=range(c(x$gamma,x$null.gamma)),
main=expression(paste("null distribution of ",
gamma)),xlab=expression(gamma),col="lightgrey")
arrows(x0=x$gamma,y0=par()$usr[4],y1=0,length=0.12,
col=make.transparent("blue",0.5),lwd=2)
text(x$gamma,0.98*par()$usr[4],
expression(paste("observed value of ",gamma)),
pos=if(x$gamma>mean(x$null.gamma)) 2 else 4)
}
Now let's see how to use it. First, I'll run it on the completely sampled tree:
## pure birth tree with complete sampling:
pb<-pbtree(n=100)
lineages<-ltt(pb)
lineages
## Object of class "ltt" containing:
##
## (1) A phylogenetic tree with 100 tips and 99 internal nodes.
##
## (2) Vectors containing the number of lineages (ltt) and branching times (times) on the tree.
##
## (3) A value for Pybus & Harvey's "gamma" statistic of -0.2688, p-value = 0.7881.
## MCCR test assuming complete sampling:
object<-mccr(lineages,rho=1,nsim=200)
object
## Object of class "mccr" consisting of:
##
## (1) A value for Pybus & Harvey's "gamma" statistic of -0.2688.
##
## (2) A two-tailed p-value from the MCCR test of 0.67.
##
## (3) A simulated null-distribution of gamma from 200 simulations.
plot(object)
A neat thing to show is that the MCCR test (based on simulation) will give a P-value that converges on the P-value from the parametric test if you run enough simulations. Let's see that:
object<-mccr(lineages,rho=1,nsim=10000)
lineages
## Object of class "ltt" containing:
##
## (1) A phylogenetic tree with 100 tips and 99 internal nodes.
##
## (2) Vectors containing the number of lineages (ltt) and branching times (times) on the tree.
##
## (3) A value for Pybus & Harvey's "gamma" statistic of -0.2688, p-value = 0.7881.
object
## Object of class "mccr" consisting of:
##
## (1) A value for Pybus & Harvey's "gamma" statistic of -0.2688.
##
## (2) A two-tailed p-value from the MCCR test of 0.7942.
##
## (3) A simulated null-distribution of gamma from 10000 simulations.
Next, we can simulate incomplete sampling, see the effect on the P-value from ltt
,
and then compare that to a P-value obtained from the MCCR test:
incomplete.pb<-drop.tip(pb,sample(pb$tip.label,round(0.5*Ntip(pb))))
lineages<-ltt(incomplete.pb,plot=FALSE)
lineages
## Object of class "ltt" containing:
##
## (1) A phylogenetic tree with 50 tips and 49 internal nodes.
##
## (2) Vectors containing the number of lineages (ltt) and branching times (times) on the tree.
##
## (3) A value for Pybus & Harvey's "gamma" statistic of -0.9876, p-value = 0.3234.
object<-mccr(lineages,rho=0.5,nsim=200)
object
## Object of class "mccr" consisting of:
##
## (1) A value for Pybus & Harvey's "gamma" statistic of -0.9876.
##
## (2) A two-tailed p-value from the MCCR test of 0.76.
##
## (3) A simulated null-distribution of gamma from 200 simulations.
Finally, for fun we can simulate a tree with a shape that should correspond to a positive
value of γ. Here, I'll do it using the phytools internal function
ebTree
. We can then subsample the tree to simulate incomplete sampling.
eb<-phytools:::ebTree(pbtree(n=200,scale=1),-1.2)
ltt(eb)
## Object of class "ltt" containing:
##
## (1) A phylogenetic tree with 200 tips and 199 internal nodes.
##
## (2) Vectors containing the number of lineages (ltt) and branching times (times) on the tree.
##
## (3) A value for Pybus & Harvey's "gamma" statistic of 2.0067, p-value = 0.0448.
incomplete.eb<-drop.tip(eb,sample(eb$tip.label,round(0.5*Ntip(eb))))
ltt(incomplete.eb)
## Object of class "ltt" containing:
##
## (1) A phylogenetic tree with 100 tips and 99 internal nodes.
##
## (2) Vectors containing the number of lineages (ltt) and branching times (times) on the tree.
##
## (3) A value for Pybus & Harvey's "gamma" statistic of -0.2614, p-value = 0.7938.
object<-mccr(ltt(incomplete.eb,plot=FALSE),rho=0.5,nsim=200)
object
## Object of class "mccr" consisting of:
##
## (1) A value for Pybus & Harvey's "gamma" statistic of -0.2614.
##
## (2) A two-tailed p-value from the MCCR test of 0.06.
##
## (3) A simulated null-distribution of gamma from 200 simulations.
plot(object)
Neat.
This is not yet in phytools, but I will add documentation & put it in a future release.
No comments:
Post a Comment
Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.