Thursday, June 11, 2015

Testing a null hypothesis of K=1.0 with measurement error in the estimation of trait means for species

In the past I've described a “parametric bootstrapping” approach for testing the null hypothesis that Blomberg's K is significantly different from 1.0 (for instance here). 1.0 is the expected value of Blomberg's K under a Brownian motion model of evolutionary change for our character. Here, I'll describe the relatively simple modification of that procedure that can be used to conduct the same hypothesis test under conditions in which we also have sampling error in the estimation of species means for our character.

Let's start by simulating some trait data with sampling error. We can do this using the phytools functions pbtree (for phylogeny simulation), fastBM (to simulate trait data), and sampleFrom, which is a simple function, normally used internally, which allows us to sample trait data under some model (for instance, with error).

library(phytools)
## simulate tree
tree<-pbtree(n=100,scale=1)
## simulate trait data
x<-fastBM(tree,sig2=1)
## add error
ve<-setNames(rchisq(n=Ntip(tree),df=1)/runif(n=Ntip(tree),min=1,max=5),
    tree$tip.label)
xe<-sampleFrom(x,ve,n=rep(1,Ntip(tree)))
plot(x,xe,xlab="true species means",
    ylab="simulated species means with sampling error")

plot of chunk unnamed-chunk-1

Now let's fit our model & generate data under the null hypothesis of K = 1.0. This is where we would normally begin with a genuine empirical dataset.

fit<-phylosig(tree,xe,se=sqrt(ve))
fit
## $K
## [1] 0.9085662
## 
## $sig2
## [1] 1.125822
## 
## $logL
## [1] -112.5553
nullX<-fastBM(tree,nsim=200,sig2=fit$sig2)
nullXe<-apply(nullX,2,sampleFrom,xvar=ve,n=rep(1,Ntip(tree)))
obj<-apply(nullXe,2,phylosig,tree=tree,se=sqrt(ve))
nullK<-sapply(obj,function(x) x[[1]])
hist(nullK,breaks=20,xlab="null distribution for K",
    main="Null distribution of K")
lines(c(fit$K,fit$K),c(0,par()$usr[4]),lty="dashed",col="red")
text(x=fit$K,y=0.985*par()$usr[4],"observed value of K",
    pos=4,offset=0.2)

plot of chunk unnamed-chunk-2

Obviously, here we are well within the null distribution for K, but we can also attach a P-value to this observation. I usually use the logarithm of K, because this equally penalizes K = 0.5 and K = 2.0.

P<-mean(abs(log(nullK))>=abs(log(fit$K))) ## two-tailed test
P
## [1] 0.675

That's it.

1 comment:

  1. Hi Liam,

    Thanks so much for providing this, and your previous post on testing against K=1. I am using this approach for the analysis in a manuscript I am working on. How should I cite you?

    ReplyDelete

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.