The REML version works by using a likelihood function for the phylogenetically independent contrasts rather than for the original data. We optimize the multi-rate model, in this case, by optimization the (restricted) likelihood of different rescaling of the different branch lengths of the tree. The problem with maximizing the likelihood for the original data is that it involves constructing and inverting matrices of dimension

*n*×

*n*for

*n*species. By maximizing a likelihood function for the contrasts, we can avoid this very computationally intensive operation. In addition, REML (unlike ML) produces unbiased parameter estimates. ML estimates are biased downward by a factor of

*n*/(

*n*-1).

To see the improvement in computation time that is provided by REML in this case, download the source file (here), and try out the following exercise. (Note that on a tree with 1000 terminal species, as in this example, be forewarned that brownie.lite() will take several minutes to run.)

> require(geiger); require(phytools)

> set.seed(10) # for repeatability

> tree<-drop.tip(birthdeath.tree(b=1,d=0,taxa.stop=1001),"1001")

> tree<-rescaleTree(tree,1)

> tree<-sim.history(tree,matrix(c(-1,1,1,-1),2,2))

> x<-sim.rates(tree,c(1,10))

names absent from sig2: assuming same order as $mapped.edge

> system.time(result1<-brownie.lite(tree,x))

user system elapsed

353.53 3.39 357.70

> source("brownieREML.R")

> system.time(result2<-brownieREML(tree,x))

user system elapsed

0.28 0.00 0.28

> result1

$sig2.single

[1] 6.477488

...

$logL1

[1] -1348.224

...

$sig2.multiple

1 2

1.010363 9.819315

...

$logL.multiple

[1] -1160.006

...

> result2

$sig2.single

[1] 6.483972

$logL1

[1] -1347.577

$sig2.multiple

1 2

1.012534 9.824353

$logL2

[1] -1159.737

$convergence

[1] TRUE

I anticipate that brownieREML() will be added to future versions of the "phytools" library.

hey Liam, I have a couple of questions regarding this REML version when it comes to comparing models. First, at least in mixed-effects models there is a known limitation of using REML to compare the fit of models with different fixed effects, because the ReLL depends on the parametrization. Would this be an issue here, or do the parameters calculated in the single and multirate models not influence the ReLL calculations? Second, would this affect the way one should calculate AIC/AICc values? Again, thanks for putting this together and for the help!

ReplyDeleteHi Rafael.

ReplyDeleteSo far as I understand it, the restricted log-likelihoods are comparable in this case because the same nuisance parameter (the mean) is being left out of each model. Certainly, the likelihoods from ML (brownie.lite) and REML (brownieREML) should not be compared. I believe that by the same logic information criteria computed in the standard way should be comparable between the two different models.

Any other comments/corrections welcome.

- Liam

Thanks for putting this together, Liam. This will be very useful especially for large trees.

ReplyDeleteI believe there should in principle be a way to convert a REML likelihood into a true likelihood--as the data is the same, it should differ by a constant or something. I am not sure how this can be done but it at least seems possilbe.

cheers

matt