A colleague contacted me today about “alternative implementations” of geiger's fitContinuous
function
that he could use to check parameter estimation, and maybe to speed calculations as fitContinuous
can be somewhat slow (although is evidently quite robust now) for some models.
Well, I don't know about speeding things up - but it is pretty straightforward to write a wrapper function that
combines phytools brownie.lite
single-rate model with geiger rescale.phylo
to fit
many of the models of fitContinuous
. The following is a simple demo:
## load packages
library(phytools)
library(geiger)
## helper function to get the root node number
getRoot<-function(tree) Ntip(tree)+1
## here is our fitContinuous lite function
fitCont<-function(tree,x,model="BM",interval=NULL){
if(model!="BM"){
lk<-function(par,tree,x,model) brownie.lite(paintSubTree(rescale(tree,model,par),getRoot(tree),"1"),x)$logL1
oFit<-optimize(lk,interval,tree=tree,x=x,model=model,maximum=TRUE)
cFit<-brownie.lite(paintSubTree(rescale(tree,model,oFit$maximum),getRoot(tree),"1"),x)
obj<-list(par=oFit$maximum,model=model,sig2=cFit$sig2.single,a=cFit$a.single,logLik=oFit$objective)
} else {
fit<-brownie.lite(paintSubTree(tree,getRoot(tree),"1"),x)
obj<-list(model=model,sig2=fit$sig2.single,a=fit$a.single,logLik=fit$logL1)
}
obj
}
OK, now let's try it for simulated data:
## simulate tree
tree<-pbtree(n=100,scale=1)
## simulate data under model="BM"
x<-fastBM(tree)
fitContinuous(tree,x)
## GEIGER-fitted comparative model of continuous data
## fitted 'BM' model parameters:
## sigsq = 0.940924
## z0 = -0.186971
##
## model summary:
## log-likelihood = -58.714396
## AIC = 121.428793
## AICc = 121.552504
## free parameters = 2
##
## Convergence diagnostics:
## optimization iterations = 100
## failed iterations = 0
## frequency of best fit = 1.00
##
## object summary:
## 'lik' -- likelihood function
## 'bnd' -- bounds for likelihood search
## 'res' -- optimization iteration summary
## 'opt' -- maximum likelihood parameter estimates
fitCont(tree,x)
## $model
## [1] "BM"
##
## $sig2
## [1] 0.9409
##
## $a
## [1] -0.187
##
## $logLik
## [1] -58.71
## simulate data under model="lambda"
x<-fastBM(rescale(tree,model="lambda",0.5))
fitContinuous(tree,x,model="lambda")
## GEIGER-fitted comparative model of continuous data
## fitted 'lambda' model parameters:
## lambda = 0.729756
## sigsq = 1.217453
## z0 = 0.254493
##
## model summary:
## log-likelihood = -122.552971
## AIC = 251.105941
## AICc = 251.355941
## free parameters = 3
##
## Convergence diagnostics:
## optimization iterations = 100
## failed iterations = 0
## frequency of best fit = 0.32
##
## object summary:
## 'lik' -- likelihood function
## 'bnd' -- bounds for likelihood search
## 'res' -- optimization iteration summary
## 'opt' -- maximum likelihood parameter estimates
fitCont(tree,x,model="lambda",interval=c(0,1))
## $par
## [1] 0.7298
##
## $model
## [1] "lambda"
##
## $sig2
## [1] 1.217
##
## $a
## [1] 0.2545
##
## $logLik
## [1] -122.6
## simulate data under model="OU"
x<-fastBM(rescale(tree,model="OU",0.4))
fitContinuous(tree,x,model="OU")
## GEIGER-fitted comparative model of continuous data
## fitted 'OU' model parameters:
## alpha = 0.536268
## sigsq = 1.171551
## z0 = 0.216378
##
## model summary:
## log-likelihood = -61.980067
## AIC = 129.960133
## AICc = 130.210133
## free parameters = 3
##
## Convergence diagnostics:
## optimization iterations = 100
## failed iterations = 0
## frequency of best fit = 0.60
##
## object summary:
## 'lik' -- likelihood function
## 'bnd' -- bounds for likelihood search
## 'res' -- optimization iteration summary
## 'opt' -- maximum likelihood parameter estimates
fitCont(tree,x,model="OU",interval=c(0,1))
## $par
## [1] 0.5363
##
## $model
## [1] "OU"
##
## $sig2
## [1] 1.172
##
## $a
## [1] 0.2164
##
## $logLik
## [1] -61.98
Running system.time
with any of these examples (except for model="BM"
) does not show any
improvement of this ostensibly lighter implementation - but this is probably due to lots of unnecessary internals in
brownie.lite
for fitting a multi-rate model (which is not actually done here). Improving on this is thus
pretty simple (but I will not do so here, nor now).
That's it.