Tuesday, September 12, 2017

Type I error & power for ratebytree with data type="discrete"

Yesterday I posted about an extension of the phytools function ratebytree to fit different rates of discrete character evolution to different trees, and then to compare this model to one in which the rates of evolutionary change between states have been forced to be the same.

I thought it might be neat to do some very simple analyses of the type I error rate of the method - as well as of its power to detect a difference in rate when one has been simulated. This is something that we have done (more thoroughtly) in our manuscript describing the statistical properties of ratebytree for continuous traits.

To accomplish this, I will first need to simulate under the null & measure the fraction of times amongst my simulations for which the null is rejected (type I errors). Next, I will simulate under various differences in evolutionary rate between trees and measure the fraction of times that the null hypothesis is rejected when false (power).

Here I go. First, with type I error:

library(phytools)
packageVersion("phytools") ## check we have 0.6.25 or above
## [1] '0.6.25'
## simulate pairs of trees
trees<-replicate(200,pbtree(n=100,nsim=2),simplify=FALSE)
head(trees,n=3)
## [[1]]
## 2 phylogenetic trees
## 
## [[2]]
## 2 phylogenetic trees
## 
## [[3]]
## 2 phylogenetic trees
## now simulate pairs of datasets on each of these trees
Q<-matrix(c(-1,1,1,-1),2,2,dimnames=list(letters[1:2],
    letters[1:2]))
foo<-function(t,Q)
    list(as.factor(sim.history(t[[1]],Q,message=F)$states),
        as.factor(sim.history(t[[2]],Q,message=F)$states))
x<-lapply(trees,foo,Q=Q)
head(x,n=3)
## [[1]]
## [[1]][[1]]
##   t6  t52  t99 t100  t87  t88  t67  t68  t20  t95  t96  t81  t82  t21  t22 
##    a    a    a    a    b    b    b    a    b    a    a    a    a    b    b 
##  t15   t1   t2  t47  t48  t75  t76  t69  t70  t91  t92  t73  t74  t46  t12 
##    b    b    b    b    b    b    b    a    a    a    a    b    a    a    a 
##  t16  t17  t44  t97  t98  t63  t32  t24  t27  t77  t78   t3  t18  t19  t64 
##    b    a    a    b    b    a    a    b    a    a    a    b    a    a    b 
##  t83  t84   t8  t10  t56  t57  t42  t43  t23  t38  t39  t40  t41  t33  t71 
##    b    a    b    a    b    a    b    a    a    b    b    a    b    a    a 
##  t72  t28  t25  t13  t30  t31   t5  t61  t62  t29  t89  t90  t85  t49  t36 
##    a    a    a    a    a    a    b    b    a    a    b    a    b    a    b 
##  t37  t26  t79  t80  t53  t54  t93  t94  t86  t45  t50  t51  t14  t58  t59 
##    b    b    a    a    b    a    a    a    a    b    b    b    a    a    a 
##   t7   t4   t9  t34  t35  t11  t55  t60  t65  t66 
##    a    a    a    a    a    b    a    a    b    a 
## Levels: a b
## 
## [[1]][[2]]
##  t26  t51  t52  t16  t91  t92  t28  t24  t43  t44  t67  t99 t100  t45  t46 
##    a    a    a    a    b    b    b    b    b    b    b    b    b    b    b 
##  t76  t83  t84  t74  t75  t20  t55  t56   t5   t6  t63  t64  t48  t49  t72 
##    a    a    a    a    a    a    a    a    b    b    a    a    a    a    a 
##  t73  t25  t36  t59  t60  t40  t41  t68  t69  t54  t47   t4  t13  t17  t18 
##    a    b    a    a    b    a    a    a    a    a    a    a    b    a    a 
##  t79  t87  t88  t37  t35   t8  t29  t30  t89  t90  t42  t12   t9  t38  t39 
##    b    a    a    a    a    b    b    b    a    a    b    a    b    b    b 
##  t14  t95  t96  t11  t31  t32  t85  t86  t19  t97  t98  t82  t53  t61  t62 
##    a    b    b    b    b    a    b    b    b    a    a    a    b    b    a 
##  t10   t2  t57  t58   t7  t70  t71   t3  t33  t34  t50  t77  t78  t65  t66 
##    a    b    a    a    b    b    b    b    b    b    a    b    b    b    a 
##  t27  t80  t81   t1  t21  t93  t94  t22  t23  t15 
##    b    b    b    a    b    b    b    b    a    a 
## Levels: a b
## 
## 
## [[2]]
## [[2]][[1]]
##  t40  t97  t98  t66  t67  t53  t43  t80  t81  t27  t54  t55  t10  t71  t72 
##    b    a    a    a    a    a    a    a    a    b    b    b    a    a    a 
##  t62  t63  t41  t42  t50  t75  t85  t86  t12   t7  t60  t61  t58  t59  t19 
##    a    b    b    b    b    b    b    b    a    a    a    a    a    a    a 
##  t26  t46  t47  t35  t37  t91  t92  t24   t1  t20  t21  t38  t39  t87  t88 
##    a    a    a    a    a    a    a    b    a    a    a    b    b    a    a 
##  t28   t5   t3   t4  t73  t74   t9   t2  t36  t48  t49  t82  t83  t84  t18 
##    a    a    b    a    a    a    a    a    b    a    a    a    a    a    a 
##  t29  t30  t17  t51  t52  t64  t65  t22  t23  t11  t70  t76  t77  t78  t79 
##    a    b    b    b    b    a    a    a    b    b    a    b    a    a    a 
##  t93  t94  t14  t56  t57  t68  t69  t25  t89  t90  t34  t13  t15  t16  t31 
##    a    b    a    a    a    a    a    a    a    b    a    a    b    b    a 
##  t32  t33  t99 t100  t95  t96   t6  t44  t45   t8 
##    a    a    a    a    a    a    a    b    b    a 
## Levels: a b
## 
## [[2]][[2]]
##  t53  t54  t61  t62  t11   t8   t3  t51  t52  t88  t89  t50  t99 t100  t27 
##    a    a    a    a    b    b    b    b    b    a    a    b    b    b    b 
##  t17  t18  t80  t81  t69  t70  t55  t12   t6  t56  t57  t75  t76  t68  t82 
##    a    b    b    b    b    b    b    a    a    a    a    a    a    b    a 
##  t83  t49  t93  t94  t90  t16  t23  t59  t60  t34  t28  t29  t32  t33  t24 
##    a    b    a    a    a    b    b    a    a    a    b    b    b    b    a 
##  t47  t48  t21   t4   t2   t9  t25  t26  t58  t95  t96  t42  t43   t7  t66 
##    b    b    a    a    a    b    b    b    a    a    a    b    a    a    b 
##  t67  t77  t97  t98  t35  t36   t1  t37  t38  t22  t41  t71  t72  t46  t86 
##    a    b    a    a    b    a    a    b    a    b    b    b    a    b    b 
##  t87  t73  t74  t44  t45  t39  t40  t13  t91  t92  t14  t15  t64  t65  t63 
##    b    b    b    b    b    b    b    a    a    a    a    b    b    b    b 
##  t84  t85   t5  t10  t30  t31  t78  t79  t19  t20 
##    b    b    a    a    b    b    a    a    a    a 
## Levels: a b
## 
## 
## [[3]]
## [[3]][[1]]
##  t56  t57  t71  t72  t97  t98  t77  t78   t4   t2  t59  t60  t39  t20  t26 
##    a    a    b    a    b    b    b    b    b    a    a    a    a    b    b 
##  t27  t10  t13  t14   t9  t30  t89  t90  t67  t68  t51  t15  t36  t37  t11 
##    a    b    a    b    b    a    a    a    b    b    b    a    b    b    a 
##  t49  t75  t76  t55  t31  t44  t45   t8  t53  t54  t48  t91  t92  t40  t41 
##    b    a    a    b    b    a    b    b    b    b    b    b    b    a    b 
##  t17  t46  t47  t95  t96  t29  t25  t18  t19   t6   t1  t23  t24  t73  t74 
##    b    b    a    a    a    a    a    b    a    a    a    a    b    b    b 
##   t7  t22  t58  t79  t80  t81  t82  t16  t52  t69  t70  t61  t62  t85  t86 
##    a    b    b    b    b    b    b    b    b    a    a    b    b    a    a 
##  t83  t84  t33  t34  t99 t100  t50  t32  t28  t93  t94  t35  t38  t63  t64 
##    b    b    a    a    b    b    b    a    b    a    a    a    a    a    a 
##  t87  t88  t42  t43  t21  t12   t5   t3  t65  t66 
##    a    a    b    b    b    b    b    a    a    a 
## Levels: a b
## 
## [[3]][[2]]
##   t8   t9  t58  t63  t64  t73  t92  t93  t69  t56  t57  t25  t60  t85  t86 
##    b    b    a    a    b    b    b    a    b    b    b    b    b    b    b 
##   t7  t65  t66  t37  t38   t2  t12  t49  t54  t55  t18  t15  t16  t36  t83 
##    a    a    b    b    b    a    b    a    a    a    a    b    a    a    a 
##  t84  t81  t82  t27  t89  t90   t6  t74  t75  t59  t76  t77  t47  t48  t98 
##    a    b    a    a    a    a    b    a    a    a    b    b    b    b    b 
##  t99 t100  t70  t94  t95  t91  t79  t80  t29  t23  t78  t96  t97   t1  t21 
##    b    b    a    b    b    b    a    a    a    b    b    b    b    b    a 
##  t22  t10  t43  t44  t14  t52  t53  t19  t32  t33  t17  t28  t30  t31  t39 
##    a    b    a    b    b    b    a    b    b    a    a    a    b    a    b 
##  t40  t71  t72  t61  t62  t20  t11   t4  t50  t51   t5  t45  t46  t26  t34 
##    b    b    b    a    b    b    a    a    a    b    b    b    a    b    a 
##  t35   t3  t13  t24  t41  t42  t87  t88  t67  t68 
##    a    b    b    a    b    b    b    b    a    b 
## Levels: a b
## now fit our models
fitsER<-mapply(ratebytree,trees=trees,x=x,MoreArgs=list(model="ER"),
    SIMPLIFY=FALSE)
head(fitsER,n=3)
## [[1]]
## ML common-rate model:
##  q   k   logL
## value    0.7339  1   -125.016
## 
## Model fitting method was "optimize".
## 
## ML multi-rate model:
##  q   k   logL
## tree1    1.2666  2   -124.1744
## tree2    0.5602
## 
## Model fitting method was "nlminb".
## 
## Likelihood ratio: 1.6831 
## P-value (based on X^2): 0.1945 
## 
## 
## [[2]]
## ML common-rate model:
##  q   k   logL
## value    0.5811  1   -117.8442
## 
## Model fitting method was "optimize".
## 
## ML multi-rate model:
##  q   k   logL
## tree1    0.4412  2   -117.5178
## tree2    0.687
## 
## Model fitting method was "nlminb".
## 
## Likelihood ratio: 0.6527 
## P-value (based on X^2): 0.4191 
## 
## 
## [[3]]
## ML common-rate model:
##  q   k   logL
## value    1.0253  1   -123.4968
## 
## Model fitting method was "optimize".
## 
## ML multi-rate model:
##  q   k   logL
## tree1    0.8676  2   -123.228
## tree2    1.3832
## 
## Model fitting method was "nlminb".
## 
## Likelihood ratio: 0.5376 
## P-value (based on X^2): 0.4634
## finally, pull out the P-values for each analysis:
P<-sapply(fitsER,function(x) x$P.chisq)
obj<-hist(P,breaks=seq(0,1,by=0.05),plot=F)
plot(obj$mids,obj$counts/sum(obj$counts),type="h",
    lwd=18,col=make.transparent("blue",0.4),lend=1,
    xlab="P-value",ylab="relative frequency",ylim=c(0,0.2))
title(main="P-values for data simulated under the null")
abline(h=0.05,col=make.transparent("red",0.5),lwd=4,
    lty="dashed",lend=1)

plot of chunk unnamed-chunk-1

mean(P<=0.05)
## [1] 0.06

We can see that the type I error rate is elevated above its nominal level, although just barely in this case. This should not be especially surprising as our expectation that the likelihood-ratio has a Χ2 distribution under the null is asymptotic with N, meaning that we only expect it to hold true for relatively large sample sizes. Conversely, of course, we should expect the type I error to increase for tests conducted using smaller trees (which it does - though I have not shown that here).

Next, we can examine the power of the method to detect a difference in rate between trees if one is present. Here I will simulate a transition rate q1 for tree 1 fixed at 1.0, and vary the transition rate q2 for the second tree between 1.0 and 4.0 in intervals of 0.2. Since this is a whole bunch of simulations, I'm only going to use the first 50 sets of trees from before:

trees<-trees[1:50]
q1<-1
q2<-seq(1,4,by=0.2)
foo<-function(q2,q1,trees){
    Q1<-matrix(c(-q1,q1,q1,-q1),2,2,dimnames=list(letters[1:2],
        letters[1:2]))
    Q2<-matrix(c(-q2,q2,q2,-q2),2,2,dimnames=list(letters[1:2],
        letters[1:2]))
    x<-lapply(trees,function(t,Q1,Q2)
        list(as.factor(sim.history(t[[1]],Q1,message=F)$states),
        as.factor(sim.history(t[[2]],Q2,message=F)$states)),
        Q1=Q1,Q2=Q2)
    fits<-mapply(ratebytree,trees=trees,x=x,MoreArgs=list(model="ER"),
        SIMPLIFY=FALSE)
    sapply(fits,function(x) x$P.chisq)
}
P<-sapply(q2,foo,q1=q1,trees=trees)
power<-apply(P,2,function(x) mean(x<=0.05))
plot(power~q2,cex=1.8,bg="grey",pch=21,xlab=expression(q[2]/q[1]))

plot of chunk unnamed-chunk-2

So, power increases for an increase in the difference in rate between trees 1 & 2 - just as we'd expect. Note that for this smaller set of simulations we see a more dramatically increased type I error rate under the null hypothesis of no difference in rate (q2 / q1 = 1.0).

More later.

No comments:

Post a Comment