As part of other related work, I put together a bit of code to explore parameter estimation under the bounded Brownian motion (BBM) model of Boucher & Démery (2016). This might be of interest to readers of this blog, so I thought I’d try posting it here.
TLDR: we can estimate \(\sigma^2\) pretty closely using bounded_bm
, but estimating \(x_0\) is virtually useless for any condition in which the bounds are reached multiple times during the simulation.
First, let’s start off by loading phytools.
library(phytools)
packageVersion("phytools")
## [1] '2.1.14'
Now, before I proceed, in my simulations I’ll use multiple values of \(\sigma^2\) and \(x_0\) for a fixed total tree length, but they’ll have a mean value of \(\sigma^2 = 1.0\) and \(x_0 = 0.0\), with constant bounds \([-1, 1]\).
To see that this will typically result in the bounde being reached multiple times, let’s use my illustrative simulation code from an earlier post. This shows us that simulation under these parameters is very bounded indeed.
tree<-pbtree(n=100,scale=1)
h<-max(nodeHeights(tree))
mt<-make.era.map(tree,seq(0,h,length.out=100))
st<-map.to.singleton(mt)
y<-fastBM(st,sig2=1,a=0,internal=TRUE,bounds=c(-1,1))
par(mar=c(5.1,4.1,1.1,1.1))
phenogram(st,y,ftype="off",fsize=0.4,
mar=c(5.1,4.1,1.1,1.1),lwd=5,las=1,cex.axis=0.8,
spread.cost=c(1,0),ylim=c(-1.1,1.1))
grid()
phenogram(st,y,ftype="off",add=TRUE,lwd=3,
color="white")
phenogram(st,y,ftype="off",add=TRUE,lwd=1,
color="blue")
segments(x0=c(0,0),y0=c(-1,1),x1=c(h,h),y1=c(-1,1),
lwd=6,col=palette()[2],lend=2)
segments(x0=c(0,0),y0=c(-1,1),x1=c(h,h),y1=c(-1,1),
lwd=2,lty="dashed",lend=2)
Now let’s proceed to our study.
To begin, I’ll set the number of simulations (100) and create a matrix for our results.
nsim<-100
results<-matrix(NA,nsim,7,dimnames=list(1:nsim,c("true_sig2",
"true_x0","sig2","x0","logL","bm_sig2","bm_x0")))
Next, I’ll use a for
loop to iterate nsim
simulations and model-fits.
for(i in 1:nsim){
## simulate
tree<-pbtree(n=100,scale=1)
x<-fastBM(tree,sig2=sig2<-runif(n=1,0,2),
a=x0<-runif(n=1,-1,1),
bounds=c(-1,1))
## fit using phytools::bounded_bm
bounded.fit<-bounded_bm(tree,x,lims=c(-1,1),
root="mle",lik.func="eigen",parallel=TRUE,levs=200)
## fit standard BM
bm.fit<-geiger::fitContinuous(tree,x)
## store result
results[i,]<-c(sig2,x0,bounded.fit$sigsq,
bounded.fit$x0,logLik(bounded.fit),
bm.fit$opt$sigsq,bm.fit$opt$z0)
}
Finally, let’s graph our simulated & estimated \(\sigma^2\), along with our simulated and estimated \(x_0\).
par(mfrow=c(1,2),mar=c(5.1,5.1,2.1,2.1))
plot(results[,"true_sig2"],results[,"sig2"],
xlab=expression(paste("generating ",sigma^2)),
ylab=expression(paste("estimated ",sigma^2," (bounded_bm)")),
bty="n",pch=21,bg="grey",las=1,cex.axis=0.8,cex=1.2)
grid()
legend("topleft","1:1 line",lwd=1,bty="n")
lines(range(par()$usr),range(par()$usr))
plot(results[,"true_x0"],results[,"x0"],
xlab=expression(paste("generating ",x[0])),
ylab=expression(paste("estimated ",x[0]," (bounded_bm)")),
bty="n",pch=21,bg="grey",las=1,cex.axis=0.8,cex=1.2)
grid()
lines(range(par()$usr),range(par()$usr))
This shows that we can estimate the parameters of the process – particularly \(\sigma^2\). (TBF, it’s fairly unsurprising that \(x_0\) is so much more difficult to get, since the equilibrium distribution under BBM is uniform on the interval defined by the bounds!)
Why don’t we also compare this to estimates of \(\sigma^2\) and \(x_0\) from geiger::fitContinuous
assuming unbounded Brownian motion, even though this wasn’t the generating process.
par(mfrow=c(1,2),mar=c(5.1,5.1,2.1,2.1))
plot(results[,"true_sig2"],results[,"bm_sig2"],
xlab=expression(paste("generating ",sigma^2)),
ylab=expression(paste("estimated ",sigma^2," (fitContinuous)")),
bty="n",pch=21,bg="grey",las=1,cex.axis=0.8,cex=1.2)
grid()
legend("topleft","1:1 line",lwd=1,bty="n")
lines(range(par()$usr),range(par()$usr))
plot(results[,"true_x0"],results[,"bm_x0"],
xlab=expression(paste("generating ",x[0])),
ylab=expression(paste("estimated ",x[0]," (fitContinuous)")),
bty="n",pch=21,bg="grey",las=1,cex.axis=0.8,cex=1.2)
grid()
lines(range(par()$usr),range(par()$usr))
We see that \(\sigma^2\) is systematically underestimated by fitContinuous
– increasingly for larger \(\sigma^2\), but unbiasedly estimated by bounded_bm
.
Cool!
That’s all for now, folks.
No comments:
Post a Comment
Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.