## Tuesday, June 4, 2024

### A small post about estimating variances and confidence intervals for parameter estimates obtained using the discrete approximation of Boucher & Démery (2016)

As part of a project that I’ve become involved in uses the discrete approximation of Boucher & Démery (2016) from the most important phylogenetic comparative methods paper you’ve probably never read.

One reviewer comment asked if we could provide variances for the parameter estimates in the model.

I first thought, well, we might be able to do this in principle using the negative inverse of the second derivative (curvature) of the likelihood surface, but that seems hard. It turns out to be not too hard at all!

Below is a quick demo using numDeriv::hessian.

I’m going to illustrate it using an unbounded model only so that we might compare our estimated variance to that obtained from the true probability density (as opposed to our discrete approximation) – but exactly the same technique might be used for models (such as the bounded model) for which that density is not available.

## load phytools
library(phytools)
## simulate some data under unbounded Brownian motion
tree<-pbtree(n=40,scale=1)
x<-fastBM(tree,sig2=0.5)


Let’s start by fitting an unbounded model to these data using Boucher & Démery’s (2016) discrete approximation. This is the default for phytools::bounded_bm.

fit_bm<-bounded_bm(tree,x,parallel=TRUE,levs=400)
fit_bm

## Object of class "bounded_bm" based on
##  	a discretization with k = 400 levels.
##
## Unwrapped (i.e., bounded) model
##
## Set or estimated bounds: [ -1.782633 , 3.004117 ]
##
## Fitted model parameters:
##   sigsq: 0.483746
##      x0: 0.509023
##
## Log-likelihood: -24.336981
##
## R thinks it has found the ML solution.


We can see that this provides a highly numerically similar solution and log-likelihood to geiger::fitContinuous.

fit_geiger<-geiger::fitContinuous(tree,x)
fit_geiger

## GEIGER-fitted comparative model of continuous data
##  fitted 'BM' model parameters:
## 	sigsq = 0.484429
## 	z0 = 0.507723
##
##  model summary:
## 	log-likelihood = -24.364367
## 	AIC = 52.728733
## 	AICc = 53.053058
## 	free parameters = 2
##
## Convergence diagnostics:
## 	optimization iterations = 100
## 	failed iterations = 0
## 	number of iterations with same best fit = 100
## 	frequency of best fit = 1.000
##
##  object summary:
## 	'lik' -- likelihood function
## 	'bnd' -- bounds for likelihood search
## 	'res' -- optimization iteration summary
## 	'opt' -- maximum likelihood parameter estimates


(This similarity would increase if we increased the number of bins used for the discrete approximation, but here levs=400 should suffice.)

Next, to get the curvature of the likelihood surface at the optimum value of $$\sigma^2$$ we’ll use numDeriv::hessian as follows. (Note that this takes a while for a likelihood function from bounded_bm because just computing the likelihood takes a lot more time than with geiger::fitContinuous, and in numeric differentiation we have to evaluate it a bunch of times!)

I’m going to compute the curvature of the likelihood surface as a function of $$\log(\sigma^2)$$ (following what’s done in geiger) rather than as a function of $$\sigma^2$$ itself.

lik<-function(ln_sigsq) fit_bm$lik(exp(ln_sigsq)) h_bm<-numDeriv::hessian(lik,log(fit_bm$sigsq))
h_bm

##           [,1]
## [1,] -19.98771


Now, according to the theory of likelihoods, the variance of our parameter estimate can be obtained by calculating the negative inverse of this value as follows.

var_bm<--1/h_bm
var_bm

##            [,1]
## [1,] 0.05003074


To now get confidence limits on $$\log(\sigma^2)$$, we might assume that $$\log(\sigma^2)$$ has an approximately normal sampling distribution. (The sampling distribution of $$\sigma^2$$ is expected to be $$\chi^2$$ shaped with degrees of freedom equal to $$N - 1$$ for $$N$$ tips, but this should be close to normal on a log scale for larger $$N$$.)

ci_bm<-exp(qnorm(c(0.025,0.975),
mean=log(fit_bm$sigsq),sd=sqrt(var_bm))) ci_bm  ## [1] 0.3120502 0.7499120  Now let’s do the same thing for our fit_geiger object. (Don’t worry. This is much faster.) lik<-function(ln_sigsq) fit_geiger$lik(exp(ln_sigsq))
h_geiger<-numDeriv::hessian(lik,
log(fit_geiger$opt$sigsq))
h_geiger

##      [,1]
## [1,]  -20

var_geiger<--1/h_geiger
var_geiger

##      [,1]
## [1,] 0.05

ci_geiger<-exp(qnorm(c(0.025,0.975),
mean=log(fit_geiger$opt$sigsq),
sd=sqrt(var_geiger)))
ci_geiger

## [1] 0.3125330 0.7508701


We could’ve gotten exactly the same values from geiger::fitContinuous just by running the following.

obj<-geiger::fitContinuous(tree,x,
control=list(method="L-BFGS-B",hessian=TRUE,
CI=0.95))
obj$opt$CI

##        sigsq
## lb 0.3125330
## ub 0.7508701


To wrap things up, let’s do a teensie simulation study in which we ask how often the true value of $$\sigma^2$$ falls within the 95% confidence interval for $$\sigma^2$$, based on the Hessian. On average, this should be 19 times out of every 20 (though, of course, in only 20 simulations this could vary up or down from that expectation).

nrep<-20
gen_sigsq<-rchisq(n=nrep,df=1)
gen_sigsq

##  [1] 1.5627963083 0.0463230735 0.7692314741 3.4762362046 3.5261234128 0.3262390983 0.1852434442
##  [8] 0.0216669255 0.0830091437 0.0223987080 1.6521276627 0.1442518552 0.2406248602 0.0006792875
## [15] 0.0119822476 0.0188229067 0.0411828621 2.5975374104 1.2529566293 0.0719550470

RESULTS<-matrix(NA,nrep,4,
dimnames=list(1:nrep,
c("gen","est","lower","upper")))
fits<-list()

ci<-function(fit){
lik<-function(ln_sigsq) fit$lik(exp(ln_sigsq)) h<-numDeriv::hessian(lik,log(fit$sigsq))
sd<-sqrt(-1/h)
exp(qnorm(c(0.025,0.975),
mean=log(fit$sigsq),sd=sd)) }  for(i in 1:nrep){ x<-fastBM(tree<-pbtree(n=40,scale=1), sig2=gen_sigsq[i]) fits[[i]]<-bounded_bm(tree,x,parallel=TRUE) RESULTS[i,]<-c(gen_sigsq[i],fits[[i]]$sigsq,
ci(fits[[i]]))
}
RESULTS

##             gen          est        lower       upper
## 1  1.5627963083 2.0996044087 1.3385393905 3.293394804
## 2  0.0463230735 0.0356382482 0.0228890561 0.055488733
## 3  0.7692314741 0.5196674140 0.3350772821 0.805946078
## 4  3.4762362046 3.3953894797 2.1920505958 5.259308221
## 5  3.5261234128 3.0974873398 1.9981517381 4.801651264
## 6  0.3262390983 0.2803432861 0.1804718832 0.435482562
## 7  0.1852434442 0.1666622009 0.1075130094 0.258352820
## 8  0.0216669255 0.0169538315 0.0109307182 0.026295839
## 9  0.0830091437 0.0830291524 0.0535240459 0.128798936
## 10 0.0223987080 0.0205731240 0.0132652628 0.031906901
## 11 1.6521276627 1.7639550696 1.1376408948 2.735078795
## 12 0.1442518552 0.1670106125 0.1058462905 0.263519341
## 13 0.2406248602 0.1143966646 0.0737683286 0.177401293
## 14 0.0006792875 0.0006838866 0.0004351864 0.001074714
## 15 0.0119822476 0.0087910663 0.0056674629 0.013636234
## 16 0.0188229067 0.0235506674 0.0151905809 0.036511700
## 17 0.0411828621 0.0446750049 0.0288564700 0.069164941
## 18 2.5975374104 3.4340299269 2.2185576482 5.315418127
## 19 1.2529566293 1.2655450628 0.8142715686 1.966916650
## 20 0.0719550470 0.0676367397 0.0434179544 0.105364903


Finally, let’s graph what we found.

par(mar=c(5.1,5.6,1.1,1.1))
options(scipen=1)
plot(RESULTS[,1],RESULTS[,2],bty="n",pch=21,bg="grey",las=1,
cex.axis=0.8,xlab=expression(paste("generating ",sigma^2)),
ylab="",cex=1.5,
xlim=range(RESULTS[,3:4]),ylim=range(RESULTS[,3:4]),
log="xy")
mtext(expression(paste("estimated ",sigma^2)),2,line=4)
for(i in 1:nrow(RESULTS)) segments(x0=RESULTS[i,1],y0=RESULTS[i,3],
y1=RESULTS[i,4])
lines(range(gen_sigsq),range(gen_sigsq),lwd=2,
col=make.transparent("black",0.25))
grid()
legend("topleft",c("95% CI","1:1 line"),lwd=c(1,2),
col=c("black",make.transparent("black",0.25)),cex=0.8)


That’s all for now, folks!