A *phytools* user recently asked the following interesting question:

*“I contact you for a statistical question regarding GLS and PGLS, which I am not super familiar
with. I have read both that GLS do and do not require the distribution of residuals to be normal,
and I cannot make a decision on this. A reviewer even ask us to test for normality of the variables,
not even the residuals. What is your opinion on this?”*

As I have both posted on this blog and published about in the past:

*“…we do not expect normality of the dependent (or independent, for continuous x) variables in
phylogenetic data. Instead, what we do expect is multivariate normality of the residual error in y
given X (or, equivalently, normality of y given X, controlling for the tree).”*

If we want to *test* for normality, we should test for the *multivariate normality* of the residual
errors of our fitted model. One way to do that is to back-transform our data using a decomposition
(like a Cholesky factorization) of the
phylogenetic covariance matrix (e.g., following Butler et al
2000), and then test for normality of these
*transformed* residuals. This is functionality (& philosophically) equivalent to a simpler
normalization of uncorrelated random data.

Here's a quick example, using simulation. In this case, I'll use an independent variable, *X*, that
is a factor and then do a phylogenetic generalized ANOVA, but the same analysis would apply with a
GLS regression. For my normality test, I will use the function `lillie.test`

of the
*nortest* R package on *CRAN*.

```
## load packages
library(phytools)
library(nlme)
library(nortest)
## simulate tree
tree<-NULL
while(is.null(tree))
tree<-pbtree(b=1,d=0.8,t=20,
extant.only=TRUE)
```

```
## Warning:
## no extant tips, tree returned as NULL
```

```
## simulate phylogenetic residual error
e<-fastBM(tree,sig2=2)
## simulate factor
X<-setNames(as.factor(sample(1:3,Ntip(tree),
replace=TRUE)),tree$tip.label)
## arbitrary beta
beta<-c(5,-10,30)
## simulate y
y<-beta[X]+e
## put data in data frame
Data<-data.frame(x=X,y=y)
## create correlation structure
spp<-rownames(Data)
bm<-corBrownian(1,tree,form=~spp)
## fit generalized anova using nlme::gls
fit.anova<-gls(y~x,data=Data,correlation=bm)
fit.anova
```

```
## Generalized least squares fit by REML
## Model: y ~ x
## Data: Data
## Log-restricted-likelihood: -661.9959
##
## Coefficients:
## (Intercept) x2 x3
## 7.775999 -14.964163 24.984617
##
## Correlation Structure: corBrownian
## Formula: ~spp
## Parameter estimate(s):
## numeric(0)
## Degrees of freedom: 355 total; 352 residual
## Residual standard error: 5.418697
```

Now that I've fit my model, I'm going to *visualize* and then *test for normality* of three different
vectors. First, the original data for my dependent variable, *y*; second, my *untransformed* residuals
from the fitted model; last, my transformed residuals in which I use the Cholesky factorization of the
inverse of my phylogenetic covariance matrix, `vcv(tree)`

.

```
## original response variable, y
d1<-density(y,bw=1)
plot(d1,bty="n",type="n",main="density of y",
font.main=3)
polygon(d1,col=make.transparent("blue",0.25),
border="blue")
```

```
## normality test (always fails)
lillie.test(y)
```

```
##
## Lilliefors (Kolmogorov-Smirnov) normality test
##
## data: y
## D = 0.099288, p-value = 5.66e-09
```

```
## untransformed residuals
d2<-density(residuals(fit.anova),bw=1)
plot(d2,bty="n",type="n",main="density of untransformed residuals",
font.main=3)
polygon(d2,col=make.transparent("blue",0.25),
border="blue")
```

```
## normality test (often fails)
lillie.test(residuals(fit.anova))
```

```
##
## Lilliefors (Kolmogorov-Smirnov) normality test
##
## data: residuals(fit.anova)
## D = 0.040746, p-value = 0.162
```

```
## transformed residuals
transformed<-chol(solve(vcv(tree)))%*%residuals(fit.anova)
d3<-density(transformed,bw=1)
plot(d3,bty="n",type="n",main="density of transformed residuals",
font.main=3)
polygon(d3,col=make.transparent("blue",0.25),
border="blue")
```

```
## normality test (usually passes)
lillie.test(transformed)
```

```
##
## Lilliefors (Kolmogorov-Smirnov) normality test
##
## data: transformed
## D = 0.041221, p-value = 0.1506
```

OK, now let's try a “real” example, in this case from my book with Luke Harmon using data from Kirk & Kay (2004).

```
## load data
primate.data<-read.csv("http://www.phytools.org/Rbook/3/primateEyes.csv",
row.names=1,stringsAsFactors=TRUE)
primate.tree<-read.tree("http://www.phytools.org/Rbook/3/primateEyes.phy")
## create correlation structure
spp<-rownames(primate.data)
corBM<-corBrownian(phy=primate.tree,form=~spp)
## fit model
primate.ancova<-gls(log(Orbit_area)~log(Skull_length)+
Activity_pattern,data=primate.data,
correlation=corBM)
summary(primate.ancova)
```

```
## Generalized least squares fit by REML
## Model: log(Orbit_area) ~ log(Skull_length) + Activity_pattern
## Data: primate.data
## AIC BIC logLik
## -86.45385 -74.18212 48.22693
##
## Correlation Structure: corBrownian
## Formula: ~spp
## Parameter estimate(s):
## numeric(0)
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) -0.7310348 0.3761724 -1.943350 0.0552
## log(Skull_length) 1.4717247 0.0758534 19.402228 0.0000
## Activity_patternDiurnal -0.1060187 0.0777268 -1.363991 0.1761
## Activity_patternNocturnal 0.2935941 0.1213931 2.418540 0.0177
##
## Correlation:
## (Intr) lg(S_) Actv_D
## log(Skull_length) -0.914
## Activity_patternDiurnal -0.277 0.100
## Activity_patternNocturnal -0.518 0.329 0.602
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -1.89188666 -0.77574126 -0.41922692 0.07417665 2.26411750
##
## Residual standard error: 0.2933082
## Degrees of freedom: 90 total; 86 residual
```

```
## test for normality of transformed residuals
transformed<-chol(solve(vcv(primate.tree)))%*%residuals(primate.ancova)
d3<-density(transformed,bw=0.05)
plot(d3,bty="n",type="n",main="density of transformed residuals (primate ANCOVA)",
font.main=3)
polygon(d3,col=make.transparent("blue",0.25),
border="blue")
```

```
lillie.test(transformed)
```

```
##
## Lilliefors (Kolmogorov-Smirnov) normality test
##
## data: transformed
## D = 0.094818, p-value = 0.04442
```

So, in this case we see that our transformed residuals *fail* normality, although the failure is not too
severe so (just as with OLS) we should not be overly concerned, IMO.

That's it!

## No comments:

## Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.