Yesterday I posted on estimating confidence intervals for \(\sigma^2\) under the discrete approximation of Boucher & Démery (2016).
I’ve now added this as a feature to phytools::bounded_bm
. Here, I’m going to examine whether this approach gives us reasonably accurate confidence intervals when evolution genuinely does transpire via bounded Brownian motion evolution.
To begin with, let’s simulate unbounded Brownian evolution and show how we can use bounded_bm
to estimate a confidence interval around it’s estimate of \(\sigma^2\) using the discrete approximation, and then compare that to the same interval estimated using geiger::fitContinuous
. (This simply reiterates what I did yesterday.)
## load phytools
library(phytools)
## Loading required package: ape
## Loading required package: maps
packageVersion("phytools")
## [1] '2.2.9'
## simulate a tree & some data under unbounded
## Brownian motion
tree<-pbtree(n=40,scale=1)
x<-fastBM(tree,sig2=1.25)
## fit model using geiger::fitContinuous
g_fit<-geiger::fitContinuous(tree,x,
control=list(hessian=TRUE,CI=0.95))
g_fit
## GEIGER-fitted comparative model of continuous data
## fitted 'BM' model parameters:
## sigsq = 0.976323
## z0 = -0.139914
##
## model summary:
## log-likelihood = -29.235344
## AIC = 62.470688
## AICc = 62.795012
## free parameters = 2
##
## Convergence diagnostics:
## optimization iterations = 100
## failed iterations = 0
## number of iterations with same best fit = 100
## frequency of best fit = 1.000
##
## object summary:
## 'lik' -- likelihood function
## 'bnd' -- bounds for likelihood search
## 'res' -- optimization iteration summary
## 'opt' -- maximum likelihood parameter estimates
## confidence interval on sigma-squared
g_fit$opt$CI
## sigsq
## lb 0.6298818
## ub 1.5133100
## see that this is the same using bounded_bm
fit_bm<-bounded_bm(tree,x,parallel=TRUE,levs=400,
CI=0.95)
fit_bm
## Object of class "bounded_bm" based on
## a discretization with k = 400 levels.
##
## Unwrapped (i.e., bounded) model
##
## Set or estimated bounds: [ -2.913579 , 3.316537 ]
##
## Fitted model parameters:
## sigsq: 0.974855 [ 0.6288 , 1.5113 ]
## x0: -0.13339
##
## Log-likelihood: -29.202885
##
## R thinks it has found the ML solution.
This shows us that our confidence intervals are accurate (or, at least, equivalent to what we’d obtain for geiger::fitContinuous
) for unbounded Brownian evolution using the discrete approximation.
Now let’s proceed to a very modest simulation.
I’m going to generate data with constant \([0, 1]\) bounds, and randomly sampled \(\sigma^2\) values sampled from a \(\chi^2\) distribution with \(d.f.=2\). This sampling distribution should generating some very small and some large values of \(\sigma^2\).
I predict that \(\sigma^2\) will be estimated unbiasedly by geiger::fitContinuous
when low, but underestimated as \(\sigma^2\) increases such that the reflective boundaries of the space are reached.
We should also see that the confidence limits around \(\sigma^2\) often fail to include the generating value for higher values of \(\sigma^2\) if unbounded Brownian motion evolution is assumed.
nrep<-40 ## number of simulations
gen_sigsq<-rchisq(n=nrep,df=2)
range(gen_sigsq)
## [1] 0.1224179 7.7297073
RESULTS<-matrix(NA,nrep,7,
dimnames=list(1:nrep,
c("gen","est_g","lower_g","upper_g",
"est_bbm","lower_bbm","upper_bbm")))
bbm_fits<-list()
## simulation test
ntaxa<-40
for(i in 1:nrep){
x<-fastBM(tree<-pbtree(n=ntaxa,scale=1),a=0.5,
sig2=gen_sigsq[i],bounds=c(0,1))
bbm_fits[[i]]<-bounded_bm(tree,x,parallel=TRUE,
lims=c(0,1),CI=TRUE)
g_fit<-geiger::fitContinuous(tree,x,
control=list(hessian=TRUE,CI=0.95))
RESULTS[i,]<-c(gen_sigsq[i],
g_fit$opt$sigsq,g_fit$opt$CI[,1],
bbm_fits[[i]]$sigsq,bbm_fits[[i]]$CI)
}
RESULTS
## gen est_g lower_g upper_g est_bbm lower_bbm upper_bbm
## 1 1.091 0.264 0.170 0.409 0.606 0.241 1.525
## 2 0.290 0.161 0.104 0.249 0.251 0.152 0.414
## 3 0.728 0.329 0.212 0.511 0.582 0.315 1.074
## 4 0.705 0.337 0.217 0.522 0.751 0.328 1.721
## 5 0.473 0.246 0.159 0.382 0.666 0.276 1.606
## 6 1.436 0.498 0.322 0.773 1.680 0.628 4.491
## 7 4.548 0.443 0.285 0.686 2.160 0.557 8.378
## 8 0.314 0.173 0.111 0.268 0.323 0.189 0.553
## 9 1.276 0.353 0.228 0.547 0.943 0.349 2.552
## 10 1.827 0.583 0.376 0.903 1.909 0.771 4.728
## 11 4.122 0.467 0.301 0.724 2.130 0.585 7.754
## 12 5.458 0.429 0.277 0.664 1.827 0.586 5.694
## 13 1.643 0.276 0.178 0.427 0.823 0.370 1.829
## 14 0.762 0.302 0.195 0.469 0.681 0.337 1.379
## 15 0.671 0.287 0.185 0.444 0.533 0.286 0.994
## 16 0.760 0.471 0.304 0.729 1.292 0.484 3.445
## 17 0.686 0.271 0.175 0.421 0.912 0.388 2.143
## 18 0.204 0.162 0.104 0.250 0.238 0.130 0.434
## 19 1.911 0.397 0.256 0.616 1.532 0.504 4.656
## 20 2.172 0.536 0.346 0.830 2.850 0.675 12.031
## 21 3.645 0.506 0.326 0.784 4.598 0.118 179.075
## 22 0.850 0.277 0.179 0.430 0.628 0.294 1.340
## 23 1.405 0.505 0.326 0.782 1.004 0.490 2.058
## 24 2.521 0.284 0.183 0.440 0.880 0.241 3.214
## 25 0.368 0.190 0.123 0.295 0.411 0.178 0.947
## 26 2.464 0.770 0.497 1.194 4.985 1.115 22.280
## 27 0.523 0.361 0.233 0.560 0.505 0.305 0.837
## 28 1.579 0.310 0.200 0.481 0.848 0.329 2.182
## 29 1.475 0.320 0.207 0.496 1.140 0.462 2.813
## 30 1.414 0.518 0.334 0.802 2.378 0.686 8.244
## 31 7.730 0.523 0.337 0.810 4.273 0.932 19.603
## 32 0.225 0.140 0.091 0.218 0.202 0.118 0.345
## 33 0.290 0.171 0.110 0.265 0.303 0.168 0.544
## 34 3.125 0.815 0.525 1.263 7.440 1.311 42.228
## 35 0.417 0.242 0.156 0.375 0.654 0.252 1.693
## 36 3.487 0.712 0.459 1.104 2.878 0.835 9.917
## 37 1.321 0.454 0.293 0.703 1.615 0.665 3.921
## 38 0.122 0.137 0.088 0.212 0.304 0.125 0.737
## 39 1.242 0.373 0.241 0.578 1.283 0.450 3.656
## 40 6.146 0.672 0.434 1.042 5.577 0.889 35.006
## graph the results
par(mar=c(5.1,5.6,2.6,1.1),mfrow=c(1,2))
options(scipen=5)
plot(RESULTS[,1],RESULTS[,2],pch=21,bg="grey",las=1,
cex.axis=0.8,
xlab=expression(paste("generating ",sigma^2," (BBM)")),
ylab="",cex=1.5,xlim=range(RESULTS[,1]),
ylim=range(RESULTS),log="xy")
mtext(expression(paste("estimated ",sigma^2," (BM)")),2,
line=4)
for(i in 1:nrow(RESULTS)) segments(x0=RESULTS[i,1],
y0=RESULTS[i,3],y1=RESULTS[i,4])
abline(a=0,b=1,lwd=2,col=make.transparent("black",0.25))
grid()
legend("topleft",c("95% CI","1:1 line"),lwd=c(1,2),
col=c("black",make.transparent("black",0.25)),cex=0.8)
mtext("a) generating vs. estimated under BM (with 95% CIs)",
line=1,cex=0.9,adj=-0.5)
plot(RESULTS[,1],RESULTS[,5],pch=21,bg="grey",las=1,
cex.axis=0.8,
xlab=expression(paste("generating ",sigma^2," (BBM)")),
ylab="",cex=1.5,xlim=range(RESULTS[,1]),
ylim=range(RESULTS),log="xy")
mtext(expression(paste("estimated ",sigma^2," (BBM)")),2,
line=4)
for(i in 1:nrow(RESULTS)) segments(x0=RESULTS[i,1],
y0=RESULTS[i,6],y1=RESULTS[i,7])
abline(a=0,b=1,lwd=2,col=make.transparent("black",0.25))
grid()
mtext("b) generating vs. estimated under BBM (with 95% CIs)",
line=1,cex=0.9,adj=-0.5)
Hopefully our figure reveals that our bounded model allows us to estimate \(\sigma^2\) unbiasedly across its range, and that our 95% confidence limits tend to include the generating value in around 19 of every 20 cases.
That’s it folks.
No comments:
Post a Comment
Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.