Sunday, February 18, 2024

Parameter estimation under bounded Brownian motion using bounded_bm in phytools

As part of other related work, I put together a bit of code to explore parameter estimation under the bounded Brownian motion (BBM) model of Boucher & Démery (2016). This might be of interest to readers of this blog, so I thought I’d try posting it here.

TLDR: we can estimate σ2\sigma^2 pretty closely using bounded_bm, but estimating x0x_0 is virtually useless for any condition in which the bounds are reached multiple times during the simulation.

First, let’s start off by loading phytools.

library(phytools)
packageVersion("phytools")
## [1] '2.1.14'

Now, before I proceed, in my simulations I’ll use multiple values of σ2\sigma^2 and x0x_0 for a fixed total tree length, but they’ll have a mean value of σ2=1.0\sigma^2 = 1.0 and x0=0.0x_0 = 0.0, with constant bounds [1,1][-1, 1].

To see that this will typically result in the bounde being reached multiple times, let’s use my illustrative simulation code from an earlier post. This shows us that simulation under these parameters is very bounded indeed.

tree<-pbtree(n=100,scale=1)
h<-max(nodeHeights(tree))
mt<-make.era.map(tree,seq(0,h,length.out=100))
st<-map.to.singleton(mt)
y<-fastBM(st,sig2=1,a=0,internal=TRUE,bounds=c(-1,1))
par(mar=c(5.1,4.1,1.1,1.1))
phenogram(st,y,ftype="off",fsize=0.4,
  mar=c(5.1,4.1,1.1,1.1),lwd=5,las=1,cex.axis=0.8,
  spread.cost=c(1,0),ylim=c(-1.1,1.1))
grid()
phenogram(st,y,ftype="off",add=TRUE,lwd=3,
  color="white")
phenogram(st,y,ftype="off",add=TRUE,lwd=1,
  color="blue")
segments(x0=c(0,0),y0=c(-1,1),x1=c(h,h),y1=c(-1,1),
  lwd=6,col=palette()[2],lend=2)
segments(x0=c(0,0),y0=c(-1,1),x1=c(h,h),y1=c(-1,1),
  lwd=2,lty="dashed",lend=2)

plot of chunk unnamed-chunk-4

Now let’s proceed to our study.

To begin, I’ll set the number of simulations (100) and create a matrix for our results.

nsim<-100
results<-matrix(NA,nsim,7,dimnames=list(1:nsim,c("true_sig2",
  "true_x0","sig2","x0","logL","bm_sig2","bm_x0")))

Next, I’ll use a for loop to iterate nsim simulations and model-fits.

for(i in 1:nsim){
  ## simulate
  tree<-pbtree(n=100,scale=1)
  x<-fastBM(tree,sig2=sig2<-runif(n=1,0,2),
    a=x0<-runif(n=1,-1,1),
    bounds=c(-1,1))
  ## fit using phytools::bounded_bm
  bounded.fit<-bounded_bm(tree,x,lims=c(-1,1),
    root="mle",lik.func="eigen",parallel=TRUE,levs=200)
  ## fit standard BM
  bm.fit<-geiger::fitContinuous(tree,x)
  ## store result
  results[i,]<-c(sig2,x0,bounded.fit$sigsq,
    bounded.fit$x0,logLik(bounded.fit),
    bm.fit$opt$sigsq,bm.fit$opt$z0)
}

Finally, let’s graph our simulated & estimated σ2\sigma^2, along with our simulated and estimated x0x_0.

par(mfrow=c(1,2),mar=c(5.1,5.1,2.1,2.1))
plot(results[,"true_sig2"],results[,"sig2"],
  xlab=expression(paste("generating ",sigma^2)),
  ylab=expression(paste("estimated ",sigma^2," (bounded_bm)")),
  bty="n",pch=21,bg="grey",las=1,cex.axis=0.8,cex=1.2)
grid()
legend("topleft","1:1 line",lwd=1,bty="n")
lines(range(par()$usr),range(par()$usr))
plot(results[,"true_x0"],results[,"x0"],
  xlab=expression(paste("generating ",x[0])),
  ylab=expression(paste("estimated ",x[0]," (bounded_bm)")),
  bty="n",pch=21,bg="grey",las=1,cex.axis=0.8,cex=1.2)
grid()
lines(range(par()$usr),range(par()$usr))

plot of chunk unnamed-chunk-7

This shows that we can estimate the parameters of the process – particularly σ2\sigma^2. (TBF, it’s fairly unsurprising that x0x_0 is so much more difficult to get, since the equilibrium distribution under BBM is uniform on the interval defined by the bounds!)

Why don’t we also compare this to estimates of σ2\sigma^2 and x0x_0 from geiger::fitContinuous assuming unbounded Brownian motion, even though this wasn’t the generating process.

par(mfrow=c(1,2),mar=c(5.1,5.1,2.1,2.1))
plot(results[,"true_sig2"],results[,"bm_sig2"],
  xlab=expression(paste("generating ",sigma^2)),
  ylab=expression(paste("estimated ",sigma^2," (fitContinuous)")),
  bty="n",pch=21,bg="grey",las=1,cex.axis=0.8,cex=1.2)
grid()
legend("topleft","1:1 line",lwd=1,bty="n")
lines(range(par()$usr),range(par()$usr))
plot(results[,"true_x0"],results[,"bm_x0"],
  xlab=expression(paste("generating ",x[0])),
  ylab=expression(paste("estimated ",x[0]," (fitContinuous)")),
  bty="n",pch=21,bg="grey",las=1,cex.axis=0.8,cex=1.2)
grid()
lines(range(par()$usr),range(par()$usr))

plot of chunk unnamed-chunk-8

We see that σ2\sigma^2 is systematically underestimated by fitContinuous – increasingly for larger σ2\sigma^2, but unbiasedly estimated by bounded_bm.

Cool!

That’s all for now, folks.

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.