Sunday, February 18, 2024

Parameter estimation under bounded Brownian motion using bounded_bm in phytools

As part of other related work, I put together a bit of code to explore parameter estimation under the bounded Brownian motion (BBM) model of Boucher & DÃ©mery (2016). This might be of interest to readers of this blog, so I thought I’d try posting it here.

TLDR: we can estimate $$\sigma^2$$ pretty closely using bounded_bm, but estimating $$x_0$$ is virtually useless for any condition in which the bounds are reached multiple times during the simulation.

library(phytools)
packageVersion("phytools")

## [1] '2.1.14'


Now, before I proceed, in my simulations I’ll use multiple values of $$\sigma^2$$ and $$x_0$$ for a fixed total tree length, but they’ll have a mean value of $$\sigma^2 = 1.0$$ and $$x_0 = 0.0$$, with constant bounds $$[-1, 1]$$.

To see that this will typically result in the bounde being reached multiple times, let’s use my illustrative simulation code from an earlier post. This shows us that simulation under these parameters is very bounded indeed.

tree<-pbtree(n=100,scale=1)
h<-max(nodeHeights(tree))
mt<-make.era.map(tree,seq(0,h,length.out=100))
st<-map.to.singleton(mt)
y<-fastBM(st,sig2=1,a=0,internal=TRUE,bounds=c(-1,1))
par(mar=c(5.1,4.1,1.1,1.1))
phenogram(st,y,ftype="off",fsize=0.4,
mar=c(5.1,4.1,1.1,1.1),lwd=5,las=1,cex.axis=0.8,
grid()
color="white")
color="blue")
segments(x0=c(0,0),y0=c(-1,1),x1=c(h,h),y1=c(-1,1),
lwd=6,col=palette()[2],lend=2)
segments(x0=c(0,0),y0=c(-1,1),x1=c(h,h),y1=c(-1,1),
lwd=2,lty="dashed",lend=2)


Now let’s proceed to our study.

To begin, I’ll set the number of simulations (100) and create a matrix for our results.

nsim<-100
results<-matrix(NA,nsim,7,dimnames=list(1:nsim,c("true_sig2",
"true_x0","sig2","x0","logL","bm_sig2","bm_x0")))


Next, I’ll use a for loop to iterate nsim simulations and model-fits.

for(i in 1:nsim){
## simulate
tree<-pbtree(n=100,scale=1)
x<-fastBM(tree,sig2=sig2<-runif(n=1,0,2),
a=x0<-runif(n=1,-1,1),
bounds=c(-1,1))
## fit using phytools::bounded_bm
bounded.fit<-bounded_bm(tree,x,lims=c(-1,1),
root="mle",lik.func="eigen",parallel=TRUE,levs=200)
## fit standard BM
bm.fit<-geiger::fitContinuous(tree,x)
## store result
results[i,]<-c(sig2,x0,bounded.fit$sigsq, bounded.fit$x0,logLik(bounded.fit),
bm.fit$opt$sigsq,bm.fit$opt$z0)
}


Finally, let’s graph our simulated & estimated $$\sigma^2$$, along with our simulated and estimated $$x_0$$.

par(mfrow=c(1,2),mar=c(5.1,5.1,2.1,2.1))
plot(results[,"true_sig2"],results[,"sig2"],
xlab=expression(paste("generating ",sigma^2)),
ylab=expression(paste("estimated ",sigma^2," (bounded_bm)")),
bty="n",pch=21,bg="grey",las=1,cex.axis=0.8,cex=1.2)
grid()
legend("topleft","1:1 line",lwd=1,bty="n")
lines(range(par()$usr),range(par()$usr))
plot(results[,"true_x0"],results[,"x0"],
xlab=expression(paste("generating ",x[0])),
ylab=expression(paste("estimated ",x[0]," (bounded_bm)")),
bty="n",pch=21,bg="grey",las=1,cex.axis=0.8,cex=1.2)
grid()
lines(range(par()$usr),range(par()$usr))


This shows that we can estimate the parameters of the process – particularly $$\sigma^2$$. (TBF, it’s fairly unsurprising that $$x_0$$ is so much more difficult to get, since the equilibrium distribution under BBM is uniform on the interval defined by the bounds!)

Why don’t we also compare this to estimates of $$\sigma^2$$ and $$x_0$$ from geiger::fitContinuous assuming unbounded Brownian motion, even though this wasn’t the generating process.

par(mfrow=c(1,2),mar=c(5.1,5.1,2.1,2.1))
plot(results[,"true_sig2"],results[,"bm_sig2"],
xlab=expression(paste("generating ",sigma^2)),
ylab=expression(paste("estimated ",sigma^2," (fitContinuous)")),
bty="n",pch=21,bg="grey",las=1,cex.axis=0.8,cex=1.2)
grid()
legend("topleft","1:1 line",lwd=1,bty="n")
lines(range(par()$usr),range(par()$usr))
plot(results[,"true_x0"],results[,"bm_x0"],
xlab=expression(paste("generating ",x[0])),
ylab=expression(paste("estimated ",x[0]," (fitContinuous)")),
bty="n",pch=21,bg="grey",las=1,cex.axis=0.8,cex=1.2)
grid()
lines(range(par()$usr),range(par()$usr))


We see that $$\sigma^2$$ is systematically underestimated by fitContinuous – increasingly for larger $$\sigma^2$$, but unbiasedly estimated by bounded_bm.

Cool!

That’s all for now, folks.