A colleague asked the following:

*"Is there a way to do a multiple stepwise regression on continuous data that incorporates phylogenetic information and selects the best model based on an AICc score?"*

Although I myself had not done this before, I figured that stepwise regression with objects of class "gls" would likely be something that someone, somewhere, in the wide world of R would have thought of before - and this is indeed correct.

Here's a complete demo with simulation code.

First, simulate data. In this case, *x*_{1} and *x*_{3} have an effect on *y*, so they ought to be found in the final model - but *x*_{2} and *x*_{4} intentionally do not.

require(phytools)

require(nlme)

require(MASS)

## simulate data (using phytools)

tree<-pbtree(n=26,tip.label=LETTERS)

x1<-fastBM(tree)

x2<-fastBM(tree)

x3<-fastBM(tree)

x4<-fastBM(tree)

y<-0.75*x1-0.5*x3+fastBM(tree,sig2=0.2)

X<-data.frame(y,x1,x2,x3,x4)

Now let's fit our model & run the stepwise regression method using AIC:

obj<-gls(y~x1+x2+x3+x4,data=X,correlation=

corBrownian(1,tree),method="ML")

## run stepwise AIC (using MASS & nlme)

fit<-stepAIC(obj,direction="both",trace=0)

trace just controls the amount of information about the backward & forward stepwise procedure that is being run. If we want more, we should increase it.

Finally, here's a worked simulation with results:

> tree<-pbtree(n=26,tip.label=LETTERS)

> x1<-fastBM(tree)

> x2<-fastBM(tree)

> x3<-fastBM(tree)

> x4<-fastBM(tree)

> y<-0.75*x1-0.5*x3+fastBM(tree,sig2=0.2)

> X<-data.frame(y,x1,x2,x3,x4)

> ## build model (using nlme)

> obj<-gls(y~x1+x2+x3+x4,data=X,correlation=

corBrownian(1,tree),method="ML")

> ## run stepwise AIC (using MASS & nlme)

> fit<-stepAIC(obj,direction="both",trace=0)

> fit

Generalized least squares fit by maximum likelihood

Model: y ~ x1 + x3

Data: X

Log-likelihood: -10.43514

Coefficients:

(Intercept) x1 x3

-0.4952197 0.6882657 -0.5108970

Correlation Structure: corBrownian

Formula: ~1

Parameter estimate(s):

numeric(0)

Degrees of freedom: 26 total; 23 residual

Residual standard error: 0.594681

This is just about what we'd expect. Cool.

## No comments:

## Post a Comment