Wednesday, August 20, 2014

Comparing models fit using GLS with different (nested) parameterizations of the error structure

Today on R-sig-phylo a reader asked:

I have a question conserning the pgls regression in package caper. The function allows to estimate or fix three branch length transformations. I wanna figure out which transformation gives me the best model fit by comparing for example a model with lambda estimated (lambda=“ML”) to a model where I fix lambda at 0.5 (lambda=0.5). However, if I use anova(model1, model2) to compare the two models I get the error message that the two models are run with different branch length transformations, which makes sense. But is there any other possbility to compare models with different branch length transformations? How do I know which model is better?

This is what I responded:

We can compare two fitted models, one in which lambda is estimated and another in which lambda is fixed (at 0, 1, or some other arbitrary value) using a likelihood ratio test. This is what that would look like:

[First, let's simulate some data:

library(phytools)
tree<-pbtree(n=50)
x<-fastBM(tree)
y<-0.75*x+fastBM(phytools:::lambdaTree(tree,0.5),sig2=0.2)
DATA<-data.frame(x=x,y=y)

Done.]

## fit models
## DATA is data frame containing x & y
## tree is object of class "phylo"
library(nlme)
fitBM<-gls(y~x,data=DATA,correlation=corBrownian(1,tree),
        method="ML")
fitBM
## Generalized least squares fit by maximum likelihood
##   Model: y ~ x 
##   Data: DATA 
##   Log-likelihood: -76.21
## 
## Coefficients:
## (Intercept)           x 
##      0.1209      0.7842 
## 
## Correlation Structure: corBrownian
##  Formula: ~1 
##  Parameter estimate(s):
## numeric(0)
## Degrees of freedom: 50 total; 48 residual
## Residual standard error: 2.159
fitLambda<-gls(y~x,data=DATA,correlation=corPagel(1,tree,fixed=FALSE),
    method="ML")
fitLambda
## Generalized least squares fit by maximum likelihood
##   Model: y ~ x 
##   Data: DATA 
##   Log-likelihood: -52.62
## 
## Coefficients:
## (Intercept)           x 
##     0.06996     0.80184 
## 
## Correlation Structure: corPagel
##  Formula: ~1 
##  Parameter estimate(s):
## lambda 
## 0.3873 
## Degrees of freedom: 50 total; 48 residual
## Residual standard error: 0.7515
## function for likelihood ratio test
lrtest<-function(model1,model2){
        lik1<-logLik(model1)
        lik2<-logLik(model2)
        LR<--2*(lik1-lik2)
        degf<-attr(lik2,"df")-attr(lik1,"df")
        P<-pchisq(LR,df=degf,lower.tail=FALSE)
        cat(paste("Likelihood ratio = ",
                signif(LR,5),"(df=",degf,") P = ",
                signif(P,4),"\n",sep=""))
        invisible(list(likelihood.ratio=LR,p=P))
}

## run likelihood-ratio test
lrtest(fitBM,fitLambda)
## Likelihood ratio = 47.176(df=1) P = 6.489e-12

I also noted that for small trees we may want to substitute a distribution obtained via simulation for the parametric χ2 distribtution; however I neglected to comment that only nested models can be compared in this way.

That's it.

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.