Earlier today
I posted on taking sampling error in *y* into account during phylogenetic
regression or ANOVA.

In the following, I'm going to do a small investigation into type I error &
power of this method. I will compare type I error & power to phylogenetic
regression *ignoring* error in *y*.

First, type I error using a simple bivariate regression.

```
library(phytools)
```

```
## Loading required package: ape
```

```
## Loading required package: maps
```

```
library(nlme)
packageVersion("phytools")
```

```
## [1] '0.5.80'
```

```
set.seed(1)
## simulate trees
trees<-pbtree(n=100,nsim=200)
## simulate true values of x & y
x<-lapply(trees,fastBM)
y<-lapply(trees,fastBM)
## simulate sampling variances for y
v<-lapply(trees, function(t) setNames(rexp(n=Ntip(t)),t$tip.label))
ye<-mapply(function(y,v)
setNames(sampleFrom(xbar=y,xvar=v,n=rep(1,length(y))),
names(y)),y=y,v=v,SIMPLIFY=FALSE)
## fit PGLS ignoring sampling error in y
fitGLS<-function(t,x,y) gls(y~x,data.frame(x,y),
correlation=corBrownian(1,t),method="ML")
fits.gls<-mapply(fitGLS,trees,x,ye,SIMPLIFY=FALSE)
## now taking it into account
fitSEy<-function(t,x,y,v) pgls.SEy(y~x,data.frame(x,y),
tree=t,se=sqrt(v),method="ML")
fits.SEy<-mapply(fitSEy,trees,x,ye,v=v,SIMPLIFY=FALSE)
P.gls<-sapply(fits.gls,function(x) anova(x)$"p-value"[2])
mean(P.gls<=0.05) ## type I error method 1
```

```
## [1] 0.04
```

```
P.sey<-sapply(fits.SEy,function(x) anova(x)$"p-value"[2])
mean(P.sey<=0.05) ## type I error method 2
```

```
## [1] 0.05
```

So we should see straight away that there is no effect on type I error of ignoring
sampling error in the estimation of species values for *y*. How about an effect
on our estimated parameter, β_{1}?

```
beta.gls<-sapply(fits.gls,function(x) coefficients(x)[2])
beta.sey<-sapply(fits.SEy,function(x) coefficients(x)[2])
d.gls<-density(beta.gls,bw=0.1)
d.sey<-density(beta.sey,bw=0.1)
plot(d.gls$x,d.gls$y,type="l",xlim=range(c(d.gls$x,d.sey$x)),
ylim=range(c(d.gls$y,d.sey$y)),ylab="density",
xlab=expression(beta[1]),lwd=2)
lines(d.sey$x,d.sey$y,lwd=2,lty="dotted")
text(d.gls$x[which(d.gls$y==max(d.gls$y))],
max(d.gls$y),label="GLS (no error)",pos=4)
text(d.sey$x[which(d.sey$y==max(d.sey$y))],
max(d.sey$y),label="GLS (with error)",pos=4)
abline(v=0,lty="dashed",col="grey")
```

What's interesting here is that the type I error of both methods is basically identical;
however the variance of the estimator that takes into count sampling error in *y*
is *much* lower than the other.

Now, let's go a bit a farther & investigate the performance of each method for varying
values of β_{1}, our regression coefficient. We can do this as follows:

```
b1<-c(0,0.2,0.5,1,1.5,2)
P.gls<-P.sey<-beta.gls<-beta.sey<-matrix(NA,length(trees),length(b1),
dimnames=list(NULL,b1))
for(i in 1:length(b1)){
x<-lapply(trees,fastBM)
y<-mapply(function(t,x,b) b*x+fastBM(t),t=trees,x=x,
MoreArgs=list(b=b1[i]),SIMPLIFY=FALSE)
v<-lapply(trees, function(t) setNames(rexp(n=Ntip(t)),t$tip.label))
ye<-mapply(function(y,v)
setNames(sampleFrom(xbar=y,xvar=v,n=rep(1,length(y))),
names(y)),y=y,v=v,SIMPLIFY=FALSE)
fitGLS<-function(t,x,y) gls(y~x,data.frame(x,y),
correlation=corBrownian(1,t),method="ML")
fits.gls<-mapply(fitGLS,trees,x,ye,SIMPLIFY=FALSE)
fitSEy<-function(t,x,y,v) pgls.SEy(y~x,data.frame(x,y),
tree=t,se=sqrt(v),method="ML")
fits.SEy<-mapply(fitSEy,trees,x,ye,v=v,SIMPLIFY=FALSE)
P.gls[,i]<-sapply(fits.gls,function(x) anova(x)$"p-value"[2])
P.sey[,i]<-sapply(fits.SEy,function(x) anova(x)$"p-value"[2])
beta.gls[,i]<-sapply(fits.gls,function(x) coefficients(x)[2])
beta.sey[,i]<-sapply(fits.SEy,function(x) coefficients(x)[2])
}
```

First, we can look the mean parameter estimates of each method:

```
par(mfrow=c(1,2))
boxplot(beta.gls,ylim=c(-0.5,3),main="GLS (no error)")
abline(h=b1,col="grey",lty="dashed")
boxplot(beta.sey,ylim=c(-0.5,3),main="GLS (with error)")
abline(h=b1,col="grey",lty="dashed")
```

We can easily see that (although both methods are unbiased), variability among simulations
is substantially higher when error in *y* is not taken into consideration.

We would expect this to affect our power to reject the null hypothesis of β_{1} = 1,
and it does:

```
plot(b1,colMeans(P.gls<=0.05),type="b",lwd=2,xlim=c(0,2),
ylim=c(0,1),xlab=expression(beta[1]),ylab="type I error / power")
lines(b1,colMeans(P.sey<=0.05),type="b",lwd=2,lty="dotted")
text(b1[3],colMeans(P.gls<=0.05)[3],label="GLS (no error)",
pos=4)
text(b1[3],colMeans(P.sey<=0.05)[3],label="GLS (with error)",
pos=2)
abline(h=0.05,lty="dashed",col="grey")
text(x=2,y=0.05,"0.05",pos=3)
```

Neat.

## No comments:

## Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.