Tuesday, September 19, 2017

Statistical properties of posthoc method for "ratebytree" object class

Yesterday, I wrote about a new post-hoc test for ratebytree. Today, I thought I would do a few additional things with this as follows:

1) Demonstrate how the method can be used to compare parameters of an OU (or EB) process between trees.

2) Examine the type I error of the method using no or various different corrections to take into account multiple testing.

3) Finally, compare the P-values obtained via likelihood-ratio to those from the posthoc test when only two trees are compared (they should be the same or similar).

Note that I have made some small updates to ratebytree for this exercise. Details of these can be seen here.

First (1).

Here I'll use the following trees & data:

library(phytools)
packageVersion("phytools")
## [1] '0.6.27'
print(trees,details=TRUE)
## 3 phylogenetic trees
## tree 1 : 40 tips
## tree 2 : 50 tips
## tree 3 : 60 tips
par(mfrow=c(1,3))
## trait x
for(i in 1:length(trees))
    phenogram(trees[[i]],x[[i]],
        spread.cost=c(1,0),ftype="i",
        ylim=range(x),label.pos=setNames(
        seq(min(sapply(x,min)),
        max(sapply(x,max)),
        diff(range(x))/(Ntip(trees[[i]])-1)),
        names(sort(x[[i]]))),
        col=make.transparent("blue",0.5))
## Optimizing the positions of the tip labels...
## Optimizing the positions of the tip labels...
## Optimizing the positions of the tip labels...

plot of chunk unnamed-chunk-1

## trait y
for(i in 1:length(trees))
    phenogram(trees[[i]],y[[i]],
        spread.cost=c(1,0),ftype="i",
        ylim=range(y),label.pos=setNames(
        seq(min(sapply(y,min)),
        max(sapply(y,max)),
        diff(range(y))/(Ntip(trees[[i]])-1)),
        names(sort(y[[i]]))),
        col=make.transparent("blue",0.5))
## Optimizing the positions of the tip labels...
## Optimizing the positions of the tip labels...
## Optimizing the positions of the tip labels...

plot of chunk unnamed-chunk-1

In trait x, it does indeed look like tree 2 might be evolving by a different (OU?) process than the other clades, but in trait y it is harder to tell.

Let's fit our models as we did yesterday:

fitOU.x<-ratebytree(trees,x,model="OU")
fitOU.x
## ML common-regime OU model:
##          s^2  a[1]   a[2]    a[3]    alpha   k   logL
## value    1.0524  0.3973  -0.0157 1.1755  0.0448  5   -200.5537   
## SE       0.1749  0.5885  0.5979  0.7325  0.107   
## 
## ML multi-regime OU model:
##          s^2[1] s^2[2]  s^2[3]   a[1]   a[2]    a[3]     alp[1] alp[2]  alp[3]  k   logL
## value    1.1236  1.3133  1.3639  0.3985  0.0032  1.1635  0.0744  2.0296  -0.1692 9   -185.2541   
## SE       0.3389  0.5097  0.249   0.5759  0.1011  0.9402  0.1908  0.8769  0   
## 
## Likelihood ratio: 30.5992 
## P-value (based on X^2): 0 
## 
## R thinks it has found the ML solution.
posthoc(fitOU.x)
## 
## Post-hoc test for "OU" model.
## (Comparison is of estimated values of alpha.)
## 
##                   t      df      P
## tree1 vs. 2 -3.1942 81.6326 0.0020
## tree1 vs. 3  0.5792 63.1136 0.5645
## tree2 vs. 3  3.8760 65.4179 0.0002
## 
## P-values adjusted using method="none".
fitOU.y<-ratebytree(trees,y,model="OU")
fitOU.y
## ML common-regime OU model:
##          s^2  a[1]   a[2]    a[3]    alpha   k   logL
## value    0.8042  -0.3228 -0.1424 0.0602  1.5824  5   -92.6983    
## SE       0.193   0.1064  0.0958  0.0795  0.4328  
## 
## ML multi-regime OU model:
##          s^2[1] s^2[2]  s^2[3]   a[1]   a[2]    a[3]     alp[1] alp[2]  alp[3]  k   logL
## value    0.9544  0.7473  0.7745  -0.3248 -0.1411 0.0596  1.8196  1.3618  1.6621  9   -92.4482    
## SE       0.4514  0.2639  0.361   0.1045  0.1048  0.0755  0.9626  0.5738  0.8639  
## 
## Likelihood ratio: 0.5003 
## P-value (based on X^2): 0.9735 
## 
## R thinks it has found the ML solution.
posthoc(fitOU.y)
## 
## Post-hoc test for "OU" model.
## (Comparison is of estimated values of alpha.)
## 
##                   t       df      P
## tree1 vs. 2  0.8756  56.6571 0.3850
## tree1 vs. 3  0.2725  67.3268 0.7861
## tree2 vs. 3 -0.6717 102.4482 0.5033
## 
## P-values adjusted using method="none".

The posthoc test only compares the values of the fitted model parameters: α for the OU model, r for the EB/ACDC model, and so on.

Our results show that in trait x there is a difference in process between trees. Trait x seems to have evolved by an OU process with a reasonably high value of α in tree 2, but low values of α (or perhaps by a BM process, which corresponds to α = 0) on trees 1 & 3.

In trait y there is no evidence for a difference in process between trees.

This is pretty good because it closely matches the conditions that we simulated, as follows:

trees<-c(pbtree(n=40),pbtree(n=50),pbtree(n=60))
class(trees)<-"multiPhylo"
x<-list(fastBM(trees[[1]]),fastBM(trees[[2]],model="OU",alpha=2),
    fastBM(trees[[3]]))
y<-list(fastBM(trees[[1]],model="OU",alpha=1.5),
    fastBM(trees[[2]],model="OU",alpha=1.5),
    fastBM(trees[[3]],model="OU",alpha=1.5))

Next, (2), I'm going to conduct a very simple type I error test of our posthoc method & test whether the adjustment methods for multiple testing result in the correct experiment-wise type I error rate. To do this, I will simulate by Brownian evolution on four trees with a constant rate, and from each analysis I will extract four values: first, the P-value for the whole test; & second, the number of times any P-value for our posthoc test was < 0.05 using 3 different corrections - "none" (no correction), "bonferroni", and "BH" (for more details see the documentation page for stats::p.adjust.

Here I go:

## first, simulate the trees & data
nrep<-200
tt<-replicate(nrep,lapply(c(40,50,60,70),function(n) pbtree(n=n)),simplify=FALSE)
tt<-lapply(tt,function(x){ class(x)<-"multiPhylo"; x})
X<-lapply(tt,function(tt) lapply(tt,fastBM))
## now fit our models
fitBMs<-mapply(ratebytree,trees=tt,x=X,SIMPLIFY=FALSE)
none<-lapply(fitBMs,posthoc)
bonferroni<-lapply(fitBMs,posthoc,p.adjust.method="bonferroni")
BH<-lapply(fitBMs,posthoc,p.adjust.method="BH")
## get our P values or error rates
P.full<-sapply(fitBMs,function(x) x$P.chisq)
typeI.none<-mean(sapply(none,function(x) any(x$P<=0.05)))
typeI.bonferroni<-mean(sapply(bonferroni,function(x) any(x$P<=0.05)))
typeI.BH<-mean(sapply(BH,function(x) any(x$P<=0.05)))
## visualize our P-values for the full test
obj<-hist(P.full,breaks=seq(0,1,by=0.05),plot=F)
plot(obj$mids,obj$counts/sum(obj$counts),type="h",
    lwd=18,col=make.transparent("blue",0.4),lend=1,
    xlab="P-value",ylab="relative frequency",ylim=c(0,0.2))
title(main="P-values for full test from data simulated under the null")
abline(h=0.05,col=make.transparent("red",0.5),lwd=1,
    lty="dotted")
text(0.96,0.05,"0.05",pos=3)

plot of chunk unnamed-chunk-4

## visualize our experiment-wise type I error rates for each p.adjust.method
library(plotrix)
typeI<-setNames(c(typeI.none,typeI.bonferroni,typeI.BH),c("none","bonferroni","BH"))
plotCI(x=c(1,2,3),y=typeI,uiw=sqrt(typeI*(1-typeI)/nrep),pch=21,pt.bg="grey",
    cex=2,xlab="correction method",ylab="experiment-wise type I error rate",
    xaxt="n",xlim=c(0.5,3.5),ylim=c(0,2*max(typeI)),
    main="experiment-wise type I error rate from posthoc tests")
axis(1,1:3,labels=FALSE)
mtext(names(typeI),at=1:3,side=1,line=1)
abline(h=0.05,col="red",lty="dotted")
text(3.45,0.05,"0.05",pos=3)

plot of chunk unnamed-chunk-4

This should show as that no correction results in an elevated experiment-wise type I eror rate; and perhaps that "bonferroni" is excessively corrective and "BH" in between. (At least that is what the documentation of p.adjust would suggest.)

Finally, (3), let's compare the posthoc P-values to the P-values from our likelihood ratio test when we only have two trees. Like a t-test and an ANOVA for two groups - these should match. (Although, unlike the ANOVA and t-test the calculation differs - the t-values are computed from SEs obtained from the Hessian, while our likelihood ratio is computed from the difference in log-likelihood between two models - so I do not know whether or not theory informs us that they need be identical as between an ANOVA with two levels & a t-test.)

Let's investigate:

## create vectors in which to store our results
P.t<-P.lr<-vector()
tt<-replicate(nrep,lapply(rep(60,2),function(n) pbtree(n=n)),simplify=FALSE)
tt<-lapply(tt,function(x){ class(x)<-"multiPhylo"; x})
X<-lapply(tt,function(tt) lapply(tt,fastBM))
fitBMs<-mapply(ratebytree,trees=tt,x=X,SIMPLIFY=FALSE)
P.lr<-sapply(fitBMs,function(x) x$P.chisq)
P.t<-sapply(fitBMs,function(x) posthoc(x)$P[1,2])
plot(P.lr,P.t,cex=1.6,bg="grey",pch=21,xlab="P-values based on likelihood-ratio",
    ylab="P-values from posthoc t-tests")

plot of chunk unnamed-chunk-5

That's my three objectives done. Neat.

No comments:

Post a Comment