Sunday, February 18, 2024

Parameter estimation under bounded Brownian motion using bounded_bm in phytools

As part of other related work, I put together a bit of code to explore parameter estimation under the bounded Brownian motion (BBM) model of Boucher & Démery (2016). This might be of interest to readers of this blog, so I thought I’d try posting it here.

TLDR: we can estimate \(\sigma^2\) pretty closely using bounded_bm, but estimating \(x_0\) is virtually useless for any condition in which the bounds are reached multiple times during the simulation.

First, let’s start off by loading phytools.

library(phytools)
packageVersion("phytools")
## [1] '2.1.14'

Now, before I proceed, in my simulations I’ll use multiple values of \(\sigma^2\) and \(x_0\) for a fixed total tree length, but they’ll have a mean value of \(\sigma^2 = 1.0\) and \(x_0 = 0.0\), with constant bounds \([-1, 1]\).

To see that this will typically result in the bounde being reached multiple times, let’s use my illustrative simulation code from an earlier post. This shows us that simulation under these parameters is very bounded indeed.

tree<-pbtree(n=100,scale=1)
h<-max(nodeHeights(tree))
mt<-make.era.map(tree,seq(0,h,length.out=100))
st<-map.to.singleton(mt)
y<-fastBM(st,sig2=1,a=0,internal=TRUE,bounds=c(-1,1))
par(mar=c(5.1,4.1,1.1,1.1))
phenogram(st,y,ftype="off",fsize=0.4,
  mar=c(5.1,4.1,1.1,1.1),lwd=5,las=1,cex.axis=0.8,
  spread.cost=c(1,0),ylim=c(-1.1,1.1))
grid()
phenogram(st,y,ftype="off",add=TRUE,lwd=3,
  color="white")
phenogram(st,y,ftype="off",add=TRUE,lwd=1,
  color="blue")
segments(x0=c(0,0),y0=c(-1,1),x1=c(h,h),y1=c(-1,1),
  lwd=6,col=palette()[2],lend=2)
segments(x0=c(0,0),y0=c(-1,1),x1=c(h,h),y1=c(-1,1),
  lwd=2,lty="dashed",lend=2)

plot of chunk unnamed-chunk-4

Now let’s proceed to our study.

To begin, I’ll set the number of simulations (100) and create a matrix for our results.

nsim<-100
results<-matrix(NA,nsim,7,dimnames=list(1:nsim,c("true_sig2",
  "true_x0","sig2","x0","logL","bm_sig2","bm_x0")))

Next, I’ll use a for loop to iterate nsim simulations and model-fits.

for(i in 1:nsim){
  ## simulate
  tree<-pbtree(n=100,scale=1)
  x<-fastBM(tree,sig2=sig2<-runif(n=1,0,2),
    a=x0<-runif(n=1,-1,1),
    bounds=c(-1,1))
  ## fit using phytools::bounded_bm
  bounded.fit<-bounded_bm(tree,x,lims=c(-1,1),
    root="mle",lik.func="eigen",parallel=TRUE,levs=200)
  ## fit standard BM
  bm.fit<-geiger::fitContinuous(tree,x)
  ## store result
  results[i,]<-c(sig2,x0,bounded.fit$sigsq,
    bounded.fit$x0,logLik(bounded.fit),
    bm.fit$opt$sigsq,bm.fit$opt$z0)
}

Finally, let’s graph our simulated & estimated \(\sigma^2\), along with our simulated and estimated \(x_0\).

par(mfrow=c(1,2),mar=c(5.1,5.1,2.1,2.1))
plot(results[,"true_sig2"],results[,"sig2"],
  xlab=expression(paste("generating ",sigma^2)),
  ylab=expression(paste("estimated ",sigma^2," (bounded_bm)")),
  bty="n",pch=21,bg="grey",las=1,cex.axis=0.8,cex=1.2)
grid()
legend("topleft","1:1 line",lwd=1,bty="n")
lines(range(par()$usr),range(par()$usr))
plot(results[,"true_x0"],results[,"x0"],
  xlab=expression(paste("generating ",x[0])),
  ylab=expression(paste("estimated ",x[0]," (bounded_bm)")),
  bty="n",pch=21,bg="grey",las=1,cex.axis=0.8,cex=1.2)
grid()
lines(range(par()$usr),range(par()$usr))

plot of chunk unnamed-chunk-7

This shows that we can estimate the parameters of the process – particularly \(\sigma^2\). (TBF, it’s fairly unsurprising that \(x_0\) is so much more difficult to get, since the equilibrium distribution under BBM is uniform on the interval defined by the bounds!)

Why don’t we also compare this to estimates of \(\sigma^2\) and \(x_0\) from geiger::fitContinuous assuming unbounded Brownian motion, even though this wasn’t the generating process.

par(mfrow=c(1,2),mar=c(5.1,5.1,2.1,2.1))
plot(results[,"true_sig2"],results[,"bm_sig2"],
  xlab=expression(paste("generating ",sigma^2)),
  ylab=expression(paste("estimated ",sigma^2," (fitContinuous)")),
  bty="n",pch=21,bg="grey",las=1,cex.axis=0.8,cex=1.2)
grid()
legend("topleft","1:1 line",lwd=1,bty="n")
lines(range(par()$usr),range(par()$usr))
plot(results[,"true_x0"],results[,"bm_x0"],
  xlab=expression(paste("generating ",x[0])),
  ylab=expression(paste("estimated ",x[0]," (fitContinuous)")),
  bty="n",pch=21,bg="grey",las=1,cex.axis=0.8,cex=1.2)
grid()
lines(range(par()$usr),range(par()$usr))

plot of chunk unnamed-chunk-8

We see that \(\sigma^2\) is systematically underestimated by fitContinuous – increasingly for larger \(\sigma^2\), but unbiasedly estimated by bounded_bm.

Cool!

That’s all for now, folks.

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.