Monday, August 28, 2017

Confidence intervals on ancestral states under an "EB" model

A few days ago I posted on reconstructing ancestral states for internal nodes of the tree under a so-caled “EB” model of evolutionary change. Today, the original poster asked me the following:

“Muchas gracias Dr. Liam, ya calculé los estados ancestrales bajo el modelo EB y tambien logré graficarlos. Si yo quisera además calcular los intervalos de confianza para cada nodo (como lo hace la función fastAnc) podria ser posible en anc.ML?.”

(In other words, can we also compute variances &/or 95% CIs for states obtained under this model.)

Indeed we can.

One way to go about this would be to use the theoretically property of Maximum Likelihood estimation that tells us that the negative inverse curvature of the likelihood surface around the optimum should give us the variance-covariance matrix of our estimates. (This is what anc.ML does for the Brownian model.)

I have now implemented this in phytools & the update can be obtained by installing phytools from GitHub using devtools as I have shown previously.

library(phytools)
packageVersion("phytools")
## [1] '0.6.23'
tree
## 
## Phylogenetic tree with 100 tips and 99 internal nodes.
## 
## Tip labels:
##  t29, t43, t44, t68, t75, t86, ...
## 
## Rooted; includes branch lengths.
x
##          t29          t43          t44          t68          t75 
##  0.438496287  0.472141210  0.508929413  0.476404420  0.500218520 
##          t86          t87          t56          t57          t21 
##  0.442388038  0.412293060  0.325040293  0.375942650  0.420729607 
##          t32          t33          t71          t72          t18 
##  0.144507151  0.074422614  0.188952192  0.203205398  0.156567264 
##           t9          t30          t48          t49          t37 
##  0.228356432  0.201488956  0.150009612  0.084055077  0.151670548 
##          t23          t24           t7           t5          t88 
##  0.120957147  0.276016226  0.289390081  0.286052945  0.285526927 
##          t89           t1          t58          t59          t84 
##  0.269553017  0.338785228 -0.638255514 -0.631750846 -0.655941964 
##          t85          t16          t65          t66          t95 
## -0.640326625 -0.817037017 -0.685259524 -0.663680995 -0.659381902 
##          t96          t90          t79          t22          t17 
## -0.676857787 -0.669947136 -0.673772868 -0.617029153 -0.709658998 
##          t82          t83           t8          t10          t19 
## -0.696291418 -0.688863550 -0.978188152 -0.876536606 -0.744679735 
##          t20           t4          t50          t51          t67 
## -0.782603431 -0.766160613 -0.779978124 -0.763065919 -0.877586463 
##          t93          t94          t91          t92          t27 
## -0.839376372 -0.853901636 -0.636389151 -0.643581270 -0.652135509 
##          t28           t6          t42          t80          t81 
## -0.657921272 -0.964749242 -0.994739494 -0.971873925 -0.970999109 
##          t41          t52          t53          t62          t78 
## -1.047142667 -1.030997961 -1.042163333 -1.037583807 -1.022104291 
##          t99         t100          t34          t35          t45 
## -1.025954239 -1.012997986 -1.047053377 -0.987320158 -1.086046548 
##          t46          t36          t38          t39          t54 
## -1.156241374 -1.120119673 -1.075409445 -1.036747839 -0.663847989 
##          t55          t97          t98          t47          t31 
## -0.698365634 -0.405954828 -0.414290095 -0.383287275 -0.344969991 
##          t11          t13          t76          t77          t73 
## -0.370850161 -1.095050380 -1.037155643 -1.045283871 -0.952923639 
##          t74           t2           t3          t14          t63 
## -1.001848791  0.141608367  0.061594482  0.313000952  0.091721162 
##          t64          t69          t70          t40          t60 
##  0.102012849  0.068840893  0.072838839  0.060826002  0.001699522 
##          t61          t26          t25          t15          t12 
##  0.065996008  0.018760874  0.174547937  0.060346979  0.097391382
fitEB<-anc.ML(tree,x,model="EB",CI=TRUE)
fitEB
## Ancestral character estimates using anc.ML under a(n) EB model:
##       101       102       103       104       105       106       107 
## -0.149645  0.303437  0.456207  0.462999  0.483923  0.463864  0.461351 
##       108       109       110       111       112       113       114 
##  0.441226  0.269318  0.254532  0.256832  0.341497  0.349491  0.134272 
##       115       116       117       118       119       120       121 
##  0.243730  0.188746  0.195718  0.237630  0.205890  0.170694  0.146584 
##       122       123       124       125       126       127       128 
##  0.128694  0.246328  0.247085  0.241900  0.216083  0.277122 -0.169551 
##       129       130       131       132       133       134       135 
## -0.682232 -0.681542 -0.688835 -0.691968 -0.647491 -0.639588 -0.648064 
##       136       137       138       139       140       141       142 
## -0.671030 -0.674188 -0.669582 -0.670976 -0.669138 -0.668707 -0.680070 
##       143       144       145       146       147       148       149 
## -0.786477 -0.812172 -0.799949 -0.794589 -0.800530 -0.764398 -0.694839 
##       150       151       152       153       154       155       156 
## -0.807828 -0.807780 -0.792406 -0.773237 -0.812107 -0.785678 -0.851186 
##       157       158       159       160       161       162       163 
## -0.847010 -0.688502 -0.640400 -0.664724 -0.917186 -0.977880 -0.972587 
##       164       165       166       167       168       169       170 
## -0.811511 -1.049309 -1.042513 -1.039931 -1.054948 -1.054235 -1.032098 
##       171       172       173       174       175       176       177 
## -1.025175 -1.019691 -1.054352 -1.043612 -1.100283 -1.067161 -1.064348 
##       178       179       180       181       182       183       184 
## -0.683323 -0.392338 -0.378467 -0.391821 -0.409841 -1.036606 -1.013423 
##       185       186       187       188       189       190       191 
## -1.037257 -0.983257  0.044576  0.095546  0.121583  0.166058  0.101472 
##       192       193       194       195       196       197       198 
##  0.106063  0.072185  0.094356  0.088166  0.084205  0.070487  0.052460 
##       199 
##  0.039961 
## 
## Lower & upper 95% CIs:
##         lower     upper
## 101 -0.484549  0.185260
## 102  0.150788  0.456086
## 103  0.395288  0.517126
## 104  0.416758  0.509240
## 105  0.452711  0.515135
## 106  0.444131  0.483597
## 107  0.444389  0.478314
## 108  0.427067  0.455385
## 109  0.172091  0.366544
## 110  0.169963  0.339102
## 111  0.185330  0.328334
## 112  0.285735  0.397259
## 113  0.322108  0.376874
## 114  0.089298  0.179247
## 115  0.162741  0.324718
## 116  0.123418  0.254074
## 117  0.174228  0.217207
## 118  0.172010  0.303250
## 119  0.143025  0.268756
## 120  0.126576  0.214812
## 121  0.112322  0.180846
## 122  0.100888  0.156500
## 123  0.185768  0.306887
## 124  0.186754  0.307416
## 125  0.181730  0.302070
## 126  0.162583  0.269584
## 127  0.261368  0.292875
## 128 -0.495176  0.156074
## 129 -0.910645 -0.453818
## 130 -0.844286 -0.518798
## 131 -0.740054 -0.637615
## 132 -0.742939 -0.640997
## 133 -0.679460 -0.615522
## 134 -0.664303 -0.614873
## 135 -0.664110 -0.632018
## 136 -0.715139 -0.626921
## 137 -0.697979 -0.650397
## 138 -0.713671 -0.625492
## 139 -0.689660 -0.652293
## 140 -0.678525 -0.659750
## 141 -0.676891 -0.660522
## 142 -0.826145 -0.533995
## 143 -0.902477 -0.670476
## 144 -0.890940 -0.733404
## 145 -0.874341 -0.725557
## 146 -0.867521 -0.721656
## 147 -0.849000 -0.752061
## 148 -0.817266 -0.711531
## 149 -0.712354 -0.677324
## 150 -0.855059 -0.760597
## 151 -0.855107 -0.760453
## 152 -0.844016 -0.740795
## 153 -0.849995 -0.696478
## 154 -0.847427 -0.776787
## 155 -0.812435 -0.758921
## 156 -0.873687 -0.828686
## 157 -0.856265 -0.837755
## 158 -0.756244 -0.620759
## 159 -0.651030 -0.629770
## 160 -0.714983 -0.614465
## 161 -0.996478 -0.837893
## 162 -1.009814 -0.945947
## 163 -0.990476 -0.954697
## 164 -0.928385 -0.694638
## 165 -1.084168 -1.014449
## 166 -1.070385 -1.014641
## 167 -1.064767 -1.015095
## 168 -1.083221 -1.026674
## 169 -1.081097 -1.027373
## 170 -1.055198 -1.008998
## 171 -1.043372 -1.006979
## 172 -1.025135 -1.014247
## 173 -1.081238 -1.027466
## 174 -1.073893 -1.013330
## 175 -1.128991 -1.071575
## 176 -1.096649 -1.037674
## 177 -1.094056 -1.034639
## 178 -0.712126 -0.654520
## 179 -0.463166 -0.321509
## 180 -0.423042 -0.333892
## 181 -0.422389 -0.361253
## 182 -0.415565 -0.404117
## 183 -1.106863 -0.966350
## 184 -1.050163 -0.976684
## 185 -1.056929 -1.017585
## 186 -1.003989 -0.962524
## 187 -0.205126  0.294277
## 188 -0.025216  0.216309
## 189  0.055321  0.187845
## 190  0.104204  0.227912
## 191  0.075894  0.127050
## 192  0.047246  0.164880
## 193  0.048985  0.095385
## 194  0.042760  0.145952
## 195  0.039358  0.136973
## 196  0.041458  0.126953
## 197  0.029344  0.111630
## 198  0.019881  0.085039
## 199  0.015198  0.064724
## 
## Fitted model parameters & likelihood:
##   sigma^2         r   logLik
##  0.449798 -4.169326 423.7552
## 
## R thinks it has found the ML solution.

Unfortunately, although estimating the ancestral states with this method is reasonably fast, computing the Hessian matrix is most definitely not as it involves numerically differentiating a very high dimensional function.

Using the curvature of the likelihood surface (or the 'Hessian' matrix) is known to be overly liberal for small sample sizes. This means that estimated variances or confidence intervals obtained in this way will tend to be too small or too narrow.

In this particular case, we can test this as we know the simulated ancestral states for our data.

First, let's plot them:

library(plotrix)
plotCI(a,fitEB$ace,pch=21,pt.bg="grey",cex=1.5,ui=fitEB$CI95[,2],li=fitEB$CI95,
    xlab="true ancestral states",
    ylab="estimates states (& CIs) under EB model")
lines(range(x),range(x),lwd=1,col="red",lty="dashed")

plot of chunk unnamed-chunk-4

We can also compute which fraction of the true ancestral states are found within the 95% CIs of the corresponding nodes. This fraction should be more or less 0.95, no?

mean(((a>=fitEB$CI95[,1])+(a<=fitEB$CI95[,2]))==2)
## [1] 0.7373737

(Note that this operation takes advantage of the fact that, in R, mathematical operations on logical values treats them as 0s and 1s.)

We see that this quantity is far smaller than we'd like it to be - suggesting, just as we feared, that CIs based on the Hessian will be too small. (Note that I believe this is also true of ace in ape as described here).

An alternative solution that I believe may be valid would be to use our fitted value of r to transform the tree and then send this transformed tree to fastAnc. fastAnc does not use the Hessian matrix to compute CIs, but an analytical solution given by Rohlf (2001).

Here's what that might look like:

obj<-fastAnc(phytools:::ebTree(tree,r=fitEB$r),x=x,CI=TRUE)
mean(((a>=obj$CI95[,1])+(a<=obj$CI95[,2]))==2)
## [1] 0.9090909

We would need to undertake an experiment to figure out if this method gives us the right SEs and CIs, but they seem closer than those obtained from the Hessian.

No comments:

Post a Comment