Saturday, August 26, 2017

Reconstructing ancestral states under an early-burst ("EB") trait evolution model

In response to a user request, I just pushed an update to anc.ML to permit the user to fit a so-called "EB" model of evolution.

The EB model is described as follows in the documentation of geiger's fitContinuous function:

“EB is the Early-burst model (Harmon et al. 2010) and also called the ACDC model (accelerating-decelerating; Blomberg et al. 2003). Set by the a rate parameter, EB fits a model where the rate of evolution increases or decreases exponentially through time, under the model r[t] = r[0] * exp(a * t), where r[0] is the initial rate, a is the rate change parameter, and t is time.”

My very knowledgeable colleagues Luke Harmon and Josef Uyeda both assured me that a an EB transformation of the tree would give me the correct variance-covariance matrix for both the internal nodes and the tips of the tree. (This is not true, evidently, for the OU model.) Consequently, my initial plan was just simultaneously optimize σ2, r, and the N - 1 ancestral nodes (for a bifurcating tree with N tips) by brute force using one of the built-in optimizers in R such as optim.

This works - but for even a modest-sized tree will result in a very high-dimensional optimization problem & thus quite long computation time.

Just as I was going to bed the other night, however, it occurred to me that this could be done by numerically optimizing only σ2 and r, and then running the re-rooting algorithm (as implemented in the phytools function fastAnc) on a tree transformed by r for that iteration of the optimization. Note that it would not work to just compute the ML values of σ2 and r using an external function, such as fitContinuous, and then to transform the tree by the optimized value of r because simultaneous optimization of σ2 and r with the ancestral states results in different MLEs of those parameters! (We also see this for the BM and OU models, although in the former case it doesn't matter.)

Here's a quick demo using some data simulated under a scenario of decreasing rate of evolution through time.

For this example, the tip data are in the vector x, while the normally unobserved internal node values are contained in the vector a.

library(phytools)
packageVersion("phytools")
## [1] '0.6.21'
tree
## 
## Phylogenetic tree with 100 tips and 99 internal nodes.
## 
## Tip labels:
##  t29, t43, t44, t68, t75, t86, ...
## 
## Rooted; includes branch lengths.
x
##          t29          t43          t44          t68          t75 
##  0.438496287  0.472141210  0.508929413  0.476404420  0.500218520 
##          t86          t87          t56          t57          t21 
##  0.442388038  0.412293060  0.325040293  0.375942650  0.420729607 
##          t32          t33          t71          t72          t18 
##  0.144507151  0.074422614  0.188952192  0.203205398  0.156567264 
##           t9          t30          t48          t49          t37 
##  0.228356432  0.201488956  0.150009612  0.084055077  0.151670548 
##          t23          t24           t7           t5          t88 
##  0.120957147  0.276016226  0.289390081  0.286052945  0.285526927 
##          t89           t1          t58          t59          t84 
##  0.269553017  0.338785228 -0.638255514 -0.631750846 -0.655941964 
##          t85          t16          t65          t66          t95 
## -0.640326625 -0.817037017 -0.685259524 -0.663680995 -0.659381902 
##          t96          t90          t79          t22          t17 
## -0.676857787 -0.669947136 -0.673772868 -0.617029153 -0.709658998 
##          t82          t83           t8          t10          t19 
## -0.696291418 -0.688863550 -0.978188152 -0.876536606 -0.744679735 
##          t20           t4          t50          t51          t67 
## -0.782603431 -0.766160613 -0.779978124 -0.763065919 -0.877586463 
##          t93          t94          t91          t92          t27 
## -0.839376372 -0.853901636 -0.636389151 -0.643581270 -0.652135509 
##          t28           t6          t42          t80          t81 
## -0.657921272 -0.964749242 -0.994739494 -0.971873925 -0.970999109 
##          t41          t52          t53          t62          t78 
## -1.047142667 -1.030997961 -1.042163333 -1.037583807 -1.022104291 
##          t99         t100          t34          t35          t45 
## -1.025954239 -1.012997986 -1.047053377 -0.987320158 -1.086046548 
##          t46          t36          t38          t39          t54 
## -1.156241374 -1.120119673 -1.075409445 -1.036747839 -0.663847989 
##          t55          t97          t98          t47          t31 
## -0.698365634 -0.405954828 -0.414290095 -0.383287275 -0.344969991 
##          t11          t13          t76          t77          t73 
## -0.370850161 -1.095050380 -1.037155643 -1.045283871 -0.952923639 
##          t74           t2           t3          t14          t63 
## -1.001848791  0.141608367  0.061594482  0.313000952  0.091721162 
##          t64          t69          t70          t40          t60 
##  0.102012849  0.068840893  0.072838839  0.060826002  0.001699522 
##          t61          t26          t25          t15          t12 
##  0.065996008  0.018760874  0.174547937  0.060346979  0.097391382
a
##         101         102         103         104         105         106 
##  0.00000000  0.56474286  0.56260547  0.49095249  0.51083929  0.48733558 
##         107         108         109         110         111         112 
##  0.48793788  0.45580093  0.33891569  0.34342841  0.20012754  0.30850154 
##         113         114         115         116         117         118 
##  0.31122528  0.08825530  0.31467638  0.22930927  0.18991152  0.26126403 
##         119         120         121         122         123         124 
##  0.21824665  0.21867299  0.18964662  0.12701860  0.28838737  0.29755425 
##         125         126         127         128         129         130 
##  0.32514997  0.24951858  0.28827983 -0.03014248 -0.51204649 -0.79961205 
##         131         132         133         134         135         136 
## -0.64849732 -0.64521521 -0.65938876 -0.67442981 -0.63613255 -0.71069622 
##         137         138         139         140         141         142 
## -0.67798100 -0.69087899 -0.66277186 -0.66267207 -0.66167668 -0.90598910 
##         143         144         145         146         147         148 
## -0.90334119 -0.81676385 -0.82997390 -0.85796450 -0.81789610 -0.72827860 
##         149         150         151         152         153         154 
## -0.65730144 -0.81978508 -0.81675215 -0.80007763 -0.79561166 -0.84058387 
##         155         156         157         158         159         160 
## -0.82499333 -0.88804442 -0.85468833 -0.76811047 -0.63836170 -0.63387183 
##         161         162         163         164         165         166 
## -0.86795308 -0.99065017 -0.97028343 -0.90414426 -1.09017652 -1.06623090 
##         167         168         169         170         171         172 
## -1.05896253 -1.06176918 -1.04887686 -1.01026847 -0.99830320 -1.01583314 
##         173         174         175         176         177         178 
## -1.04745049 -1.00458726 -1.10059441 -1.06715470 -1.07423575 -0.65413549 
##         179         180         181         182         183         184 
## -0.33893314 -0.31792921 -0.39497223 -0.41571693 -0.99822702 -0.97216554 
##         185         186         187         188         189         190 
## -1.02811719 -0.96863861  0.01942461  0.14562664  0.07154062  0.14167278 
##         191         192         193         194         195         196 
##  0.06133440  0.06645392  0.09498134  0.10166254  0.07414169  0.11281462 
##         197         198         199 
##  0.08205368  0.07092746  0.04485251
phenogram(tree,c(x,a),ftype="off",col="lightblue")
title(main="Simulated evolution under the EB model")

plot of chunk unnamed-chunk-1

fitEB<-anc.ML(tree,x,model="EB")
fitEB
## Ancestral character estimates using anc.ML under a(n) EB model:
##       101       102       103       104       105       106       107 
## -0.149645  0.303437  0.456207  0.462999  0.483923  0.463864  0.461351 
##       108       109       110       111       112       113       114 
##  0.441226  0.269318  0.254532  0.256832  0.341497  0.349491  0.134272 
##       115       116       117       118       119       120       121 
##  0.243730  0.188746  0.195718  0.237630  0.205890  0.170694  0.146584 
##       122       123       124       125       126       127       128 
##  0.128694  0.246328  0.247085  0.241900  0.216083  0.277122 -0.169551 
##       129       130       131       132       133       134       135 
## -0.682232 -0.681542 -0.688835 -0.691968 -0.647491 -0.639588 -0.648064 
##       136       137       138       139       140       141       142 
## -0.671030 -0.674188 -0.669582 -0.670976 -0.669138 -0.668707 -0.680070 
##       143       144       145       146       147       148       149 
## -0.786477 -0.812172 -0.799949 -0.794589 -0.800530 -0.764398 -0.694839 
##       150       151       152       153       154       155       156 
## -0.807828 -0.807780 -0.792406 -0.773237 -0.812107 -0.785678 -0.851186 
##       157       158       159       160       161       162       163 
## -0.847010 -0.688502 -0.640400 -0.664724 -0.917186 -0.977880 -0.972587 
##       164       165       166       167       168       169       170 
## -0.811511 -1.049309 -1.042513 -1.039931 -1.054948 -1.054235 -1.032098 
##       171       172       173       174       175       176       177 
## -1.025175 -1.019691 -1.054352 -1.043612 -1.100283 -1.067161 -1.064348 
##       178       179       180       181       182       183       184 
## -0.683323 -0.392338 -0.378467 -0.391821 -0.409841 -1.036606 -1.013423 
##       185       186       187       188       189       190       191 
## -1.037257 -0.983257  0.044576  0.095546  0.121583  0.166058  0.101472 
##       192       193       194       195       196       197       198 
##  0.106063  0.072185  0.094356  0.088166  0.084205  0.070487  0.052460 
##       199 
##  0.039961 
## 
## Fitted model parameters & likelihood:
##   sigma^2         r   logLik
##  0.449798 -4.169326 423.7552
## 
## R thinks it has found the ML solution.
plot(a,fitEB$ace,pch=21,bg="grey",cex=1.5,xlab="true values",
    ylab="fitted values (EB)")
lines(range(x),range(x),lwd=2,lty="dotted")
abline(h=0,lwd=1,lty="dotted",col="grey")
abline(v=0,lwd=1,lty="dotted",col="grey")

plot of chunk unnamed-chunk-1

We can compare these fitted values to those obtained assuming a BM model of evolution, as is most commonly done:

fitBM<-fastAnc(tree,x,CI=TRUE)
fitBM
## Ancestral character estimates using fastAnc:
##       101       102       103       104       105       106       107 
## -0.193765  0.186036  0.416272  0.442522  0.475950  0.460722  0.459450 
##       108       109       110       111       112       113       114 
##  0.440956  0.220870  0.222760  0.244864  0.318102  0.345322  0.146624 
##       115       116       117       118       119       120       121 
##  0.221311  0.192335  0.195784  0.226760  0.206323  0.174639  0.151006 
##       122       123       124       125       126       127       128 
##  0.132202  0.234623  0.235591  0.235146  0.218238  0.276331 -0.201955 
##       129       130       131       132       133       134       135 
## -0.519252 -0.601429 -0.677233 -0.679906 -0.648315 -0.640457 -0.648158 
##       136       137       138       139       140       141       142 
## -0.668130 -0.673712 -0.667233 -0.670759 -0.669119 -0.668701 -0.624943 
##       143       144       145       146       147       148       149 
## -0.722262 -0.779573 -0.776116 -0.781598 -0.793646 -0.770032 -0.696552 
##       150       151       152       153       154       155       156 
## -0.798796 -0.798906 -0.790307 -0.763582 -0.808358 -0.786171 -0.848426 
##       157       158       159       160       161       162       163 
## -0.846797 -0.704969 -0.641004 -0.675769 -0.867234 -0.968017 -0.970744 
##       164       165       166       167       168       169       170 
## -0.745804 -1.033322 -1.038690 -1.037851 -1.044465 -1.046229 -1.031451 
##       171       172       173       174       175       176       177 
## -1.025186 -1.019704 -1.046420 -1.039453 -1.093283 -1.059456 -1.058732 
##       178       179       180       181       182       183       184 
## -0.684756 -0.425527 -0.392179 -0.395277 -0.409858 -0.962708 -0.998106 
##       185       186       187       188       189       190       191 
## -1.033858 -0.981408 -0.067529  0.038991  0.096452  0.138378  0.101324 
##       192       193       194       195       196       197       198 
##  0.091463  0.072296  0.087245  0.084025  0.081671  0.071502  0.054412 
##       199 
##  0.041498 
## 
## Lower & upper 95% CIs:
##         lower     upper
## 101 -0.353689 -0.033841
## 102  0.049784  0.322288
## 103  0.309548  0.522996
## 104  0.351400  0.533644
## 105  0.406334  0.545565
## 106  0.414670  0.506775
## 107  0.419322  0.499579
## 108  0.406877  0.475034
## 109  0.112077  0.329662
## 110  0.122333  0.323186
## 111  0.135702  0.354027
## 112  0.216076  0.420129
## 113  0.282627  0.408017
## 114  0.054781  0.238466
## 115  0.122585  0.320036
## 116  0.079042  0.305628
## 117  0.145089  0.246480
## 118  0.135600  0.317919
## 119  0.105118  0.307528
## 120  0.088770  0.260508
## 121  0.078376  0.223635
## 122  0.069538  0.194867
## 123  0.144605  0.324641
## 124  0.145006  0.326176
## 125  0.137912  0.332381
## 126  0.119169  0.317307
## 127  0.238390  0.314272
## 128 -0.359918 -0.043992
## 129 -0.662893 -0.375611
## 130 -0.727801 -0.475058
## 131 -0.766331 -0.588134
## 132 -0.769390 -0.590422
## 133 -0.717511 -0.579119
## 134 -0.697244 -0.583671
## 135 -0.686688 -0.609627
## 136 -0.750929 -0.585330
## 137 -0.729074 -0.618351
## 138 -0.750354 -0.584111
## 139 -0.715003 -0.626515
## 140 -0.692022 -0.646216
## 141 -0.688743 -0.648658
## 142 -0.746592 -0.503293
## 143 -0.834407 -0.610116
## 144 -0.874649 -0.684496
## 145 -0.868861 -0.683371
## 146 -0.879846 -0.683350
## 147 -0.874696 -0.712595
## 148 -0.861123 -0.678942
## 149 -0.738451 -0.654653
## 150 -0.879227 -0.718366
## 151 -0.879632 -0.718181
## 152 -0.880657 -0.699956
## 153 -0.863390 -0.663774
## 154 -0.883112 -0.733604
## 155 -0.847027 -0.725316
## 156 -0.900784 -0.796068
## 157 -0.869452 -0.824141
## 158 -0.812936 -0.597001
## 159 -0.666963 -0.615045
## 160 -0.773673 -0.577865
## 161 -0.983550 -0.750918
## 162 -1.038762 -0.897272
## 163 -1.013439 -0.928050
## 164 -0.865613 -0.625995
## 165 -1.100397 -0.966246
## 166 -1.100045 -0.977336
## 167 -1.094351 -0.981351
## 168 -1.101313 -0.987617
## 169 -1.101252 -0.991205
## 170 -1.084252 -0.978651
## 171 -1.068248 -0.982123
## 172 -1.033112 -1.006296
## 173 -1.101520 -0.991320
## 174 -1.102232 -0.976674
## 175 -1.156728 -1.029838
## 176 -1.121073 -0.997840
## 177 -1.122105 -0.995359
## 178 -0.750646 -0.618867
## 179 -0.543691 -0.307363
## 180 -0.481934 -0.302425
## 181 -0.463694 -0.326860
## 182 -0.423952 -0.395763
## 183 -1.084062 -0.841354
## 184 -1.076519 -0.919694
## 185 -1.080485 -0.987230
## 186 -1.030321 -0.932496
## 187 -0.233243  0.098185
## 188 -0.120731  0.198713
## 189 -0.003465  0.196369
## 190  0.033319  0.243437
## 191  0.042163  0.160486
## 192 -0.001864  0.184790
## 193  0.018033  0.126560
## 194  0.000148  0.174341
## 195 -0.000961  0.169010
## 196  0.001453  0.161889
## 197 -0.007817  0.150821
## 198 -0.015527  0.124351
## 199 -0.015398  0.098394
plot(a,fitBM$ace,pch=21,bg="grey",cex=1.5,xlab="true values",
    ylab="fitted values (BM)")
lines(range(x),range(x),lwd=2,lty="dotted")
abline(h=0,lwd=1,lty="dotted",col="grey")
abline(v=0,lwd=1,lty="dotted",col="grey")

plot of chunk unnamed-chunk-2

There doesn't seem to be too much of a difference, but if we compare the sum-of-squares difference between the true & estimated values, in general we should find it to be smaller with the EB model:

sum((fitEB$ace-a)^2)
## [1] 0.343143
sum((fitBM$ace-a)^2)
## [1] 0.559287

Let's repeat this a bunch of times to see that it is most often the case:

nrep<-100
ntax<-100
SSdiff<-rep(NA,nrep)
for(i in 1:nrep){
    ## sim tree
    t<-pbtree(n=ntax,scale=1)
    ## sim data
    y<-fastBM(phytools:::ebTree(t,-4),internal=T)
    anc<-y[1:t$Nnode+Ntip(t)]
    y<-y[1:Ntip(t)]
    fitEB<-anc.ML(t,y,model="EB")
    fitBM<-fastAnc(t,y,CI=T)
    SSdiff[i]<-sum((fitBM$ace-anc)^2)-sum((fitEB$ace-anc)^2)
}
hist(SSdiff,col="grey",main=
    "Accuracy of BM - EB (when data simulated under EB)",
    xlab="SS(BM) - SS(EB)",breaks=seq(min(SSdiff),max(SSdiff),
    by=diff(range(SSdiff)/12)))

plot of chunk unnamed-chunk-4

This shows that EB is more accurate than BM, when the data have been simulated under an EB model.

Tree & data for the leading example were simulated as follows:

tree<-pbtree(n=100,scale=1)
x<-fastBM(phytools:::ebTree(tree,-4),internal=T)
a<-x[1:tree$Nnode+Ntip(tree)]
x<-x[1:Ntip(tree)]

No comments:

Post a Comment