Saturday, April 28, 2012

Using nlme::gls for phylogenetic regression with non-contemporaneous tips

I was just reminded tonight of the trick for performing phylogenetic regression using gls in the nlme package when the tips of the tree are non-contemporaneous. Note that there are lots of good reasons why if the tips of your tree are not contemporaneous (i.e., your tree is not ultrametric) they should be rendered so before analysis. However, there are a few different reasons (for instance, extinct species or an a priori hypothesis about how the rate of evolution of the residual error might vary among the branches of the tree) why one might wish to analyze a non-ultrametric tree.

In the function gls the correlational structure of the residual error is captured by the argument correlation. If the residual variance for different y|x differ, this must be represented in the argument weights. For a phylogenetic regression using a non-ultrametric tree this looks as follows:

> # set seed (for repeatability)
> set.seed(1)
> # simulate a tree
> tree<-rtree(n=100) # this tree is not ultrametric
> # simulate data
> x<-fastBM(tree)
> y<-0.75*x+fastBM(tree,0.2)
> # compute weights
> w<-diag(vcv.phylo(tree))
> # fit model
> fit<-gls(y~x,data=data.frame(x,y), correlation=corBrownian(1,tree),weights=varFixed(~w))
> fit
Generalized least squares fit by REML
(Intercept)           x 
 -1.3025017   0.7275838 

For comparison, let's compute the contrasts regression in the typical way. This should have the same slope as our GLS regression.

> pic.x<-pic(x,tree)
> pic.y<-pic(y,tree)
> fit<-lm(pic.y~pic.x-1) # -1 is for no intercept
> fit
lm(formula = pic.y ~ pic.x - 1)

Note that I also described this on the R-SIG-phylo email list last year (here).

1 comment:

  1. Liam, thanks for posting about this. I had two follow-up questions if you have a minute.

    First, is there a goodness of fit measure for the GLS regression? I would like to compare the results of the GLS model above to a linear model that does not account for phylogeny. I can do this in part via model comparison (e.g. via delta-AIC) but would also like to compare the amount of variation in the response variable explained by the predictor(s) under these different models. Suggestions?

    Also, any other options for this portion of the code: "correlation=corBrownian(1,tree)"? E.g., when would you use corBrownian vs corPagel?