A number of years ago I implemented the bivariate regression model of
Ives et al. (2007).
This is to to fit a phylogenetic generalized least squares regression while
taking into account sampling error in both *x* & *y*.

The method was kind of difficult for me to implement at the time.
Consequently, when some users reported peculiar results from the function
(called `pgls.Ives`

) I was worried and added a warning message
which is automatically printed with every function call (see below).

I'm still not 100% confident in my implementation - so the warning remains.
Nonetheless I was playing with the function today to add some S3 methods &
null hypothesis testing - and although the interface is kind of crude, I have
to say, it seems to work *great*! (Of course, any report of issues is
still welcome.)

In the following I will demo my updates to the function in which I first
simulate data with sampling error in *x* & *y* under the
alternative hypothesis and then test a null hypothesis *H*_{0}:
β_{1} = 0 using a LR-test.

First let's generate some data in which *x* and *y* are correlated,
but in which we have simulated sampling error in both dimensions:

```
library(phytools)
library(MASS)
tree<-pbtree(n=200,scale=1)
x<-fastBM(tree)
y<-5+3.5*x+fastBM(tree,0.2)
Vx<-rexp(length(tree$tip))/2
Vy<-rexp(length(tree$tip))/2
Cxy=sqrt(Vx*Vy)*runif(length(tree$tip),min=-1,max=1)
names(Vx)<-names(Vy)<-names(Cxy)<-names(x)
xe<-ye<-vector()
for(i in 1:length(tree$tip)){
temp<-mvrnorm(mu=c(x[i],y[i]),
Sigma=matrix(c(Vx[i],Cxy[i],Cxy[i],
Vy[i]),2,2))
xe[i]<-temp[1]
ye[i]<-temp[2]
}
names(xe)<-names(x)
names(ye)<-names(y)
```

Not only does this model permit sampling error in *x* & *y* it
*also* allows for this sampling error to be correlated, which is nice.

Let's fit our models:

```
fit0<-pgls.Ives(tree,xe,ye,Vx,Vy,Cxy,fixed.b1=0)
```

```
##
## ---------------------------------------------------------------
## | **Warning: |
## | User reports suggest that this method may frequently |
## | fail to find the ML solution. Please use with caution. |
## ---------------------------------------------------------------
```

```
fit0
```

```
##
## Result PGLS with sampling error in x & y
## (based on Ives et al. 2007):
##
## Response: y
## Intercept beta[1]
## y 3.082196 0
##
## Summary of ML estimated parameters:
## sigma^2 a log(L)
## y 13.379053 3.082196 -657.494854
## x 1.402936 -0.764788
## ---------
##
## R thinks it has converged.
```

```
fitb1<-pgls.Ives(tree,xe,ye,Vx,Vy,Cxy)
```

```
##
## ---------------------------------------------------------------
## | **Warning: |
## | User reports suggest that this method may frequently |
## | fail to find the ML solution. Please use with caution. |
## ---------------------------------------------------------------
```

```
fitb1
```

```
##
## Result PGLS with sampling error in x & y
## (based on Ives et al. 2007):
##
## Response: y
## Intercept beta[1]
## y 5.694631 3.4317
##
## Summary of ML estimated parameters:
## sigma^2 a log(L)
## y 0.383931 3.079038 -587.442054
## x 1.180562 -0.762186
## ---------
##
## R thinks it has converged.
```

Note that the parameter estimates for this model are very close to the generating values of β = [5, 3.5], much closer than if we use PGLS ignoring sampling error:

```
library(nlme)
gls(ye~xe,data=data.frame(xe=xe,ye=ye),
correlation=corBrownian(1,tree),method="ML")
```

```
## Generalized least squares fit by maximum likelihood
## Model: ye ~ xe
## Data: data.frame(xe = xe, ye = ye)
## Log-likelihood: -406.6206
##
## Coefficients:
## (Intercept) xe
## 3.5362945 0.6539054
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
## Degrees of freedom: 200 total; 198 residual
## Residual standard error: 4.579274
```

We can compare these two models - one in which we forced β_{1}
to be equal to 1.0 - using a LR-test as follows:

```
library(lmtest)
lrtest(fit0,fitb1)
```

```
## Likelihood ratio test
##
## Model 1: fit0
## Model 2: fitb1
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 4 -657.49
## 2 5 -587.44 1 140.11 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

With this method if we fix `Vx`

, `Vy`

, and
`Cxy`

to vectors of 0s we will (i.e., we *should*) obtain the
same result as with standard PGLS, which is also good to know.

```
Vx[]<-Vy[]<-Cxy[]<-0
pgls.Ives(tree,xe,ye,Vx,Vy,Cxy)
```

```
##
## ---------------------------------------------------------------
## | **Warning: |
## | User reports suggest that this method may frequently |
## | fail to find the ML solution. Please use with caution. |
## ---------------------------------------------------------------
```

```
##
## Result PGLS with sampling error in x & y
## (based on Ives et al. 2007):
##
## Response: y
## Intercept beta[1]
## y 3.512046 0.6537344
##
## Summary of ML estimated parameters:
## sigma^2 a log(L)
## y 20.99469 3.009787 -772.538868
## x 13.95150 -0.768292
## ---------
##
## R thinks it has converged.
```

Finally, let's generate some data under the null hypothesis of
β_{1} = 0.0:

```
y<-5+fastBM(tree,0.2)
Vx<-rexp(length(tree$tip))/2
Vy<-rexp(length(tree$tip))/2
Cxy=sqrt(Vx*Vy)*runif(length(tree$tip),min=-1,max=1)
names(Vx)<-names(Vy)<-names(Cxy)<-names(x)
xe<-ye<-vector()
for(i in 1:length(tree$tip)){
temp<-mvrnorm(mu=c(x[i],y[i]),
Sigma=matrix(c(Vx[i],Cxy[i],Cxy[i],
Vy[i]),2,2))
xe[i]<-temp[1]
ye[i]<-temp[2]
}
names(xe)<-names(x)
names(ye)<-names(y)
```

Fit our models & compare them using a LR-test again:

```
fit0<-pgls.Ives(tree,xe,ye,Vx,Vy,Cxy,fixed.b1=0)
```

```
##
## ---------------------------------------------------------------
## | **Warning: |
## | User reports suggest that this method may frequently |
## | fail to find the ML solution. Please use with caution. |
## ---------------------------------------------------------------
```

```
fit0
```

```
##
## Result PGLS with sampling error in x & y
## (based on Ives et al. 2007):
##
## Response: y
## Intercept beta[1]
## y 4.905824 0
##
## Summary of ML estimated parameters:
## sigma^2 a log(L)
## y 0.912601 4.905824 -477.861006
## x 1.261848 -0.730253
## ---------
##
## R thinks it has converged.
```

```
fitb1<-pgls.Ives(tree,xe,ye,Vx,Vy,Cxy)
```

```
##
## ---------------------------------------------------------------
## | **Warning: |
## | User reports suggest that this method may frequently |
## | fail to find the ML solution. Please use with caution. |
## ---------------------------------------------------------------
```

```
fitb1
```

```
##
## Result PGLS with sampling error in x & y
## (based on Ives et al. 2007):
##
## Response: y
## Intercept beta[1]
## y 4.88616 -0.02791613
##
## Summary of ML estimated parameters:
## sigma^2 a log(L)
## y 0.912113 4.906315 -477.831083
## x 1.264841 -0.721985
## ---------
##
## R thinks it has converged.
```

```
lrtest(fit0,fitb1)
```

```
## Likelihood ratio test
##
## Model 1: fit0
## Model 2: fitb1
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 4 -477.86
## 2 5 -477.83 1 0.0598 0.8067
```

Ignoring within-species sampling error & we get a highly significant result!

```
fit.gls<-gls(ye~xe,data=data.frame(xe=xe,ye=ye),
correlation=corBrownian(1,tree),method="ML")
fit.gls
```

```
## Generalized least squares fit by maximum likelihood
## Model: ye ~ xe
## Data: data.frame(xe = xe, ye = ye)
## Log-likelihood: -297.0269
##
## Coefficients:
## (Intercept) xe
## 4.7459216 -0.1897573
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
## Degrees of freedom: 200 total; 198 residual
## Residual standard error: 2.647384
```

```
summary(fit.gls)
```

```
## Generalized least squares fit by maximum likelihood
## Model: ye ~ xe
## Data: data.frame(xe = xe, ye = ye)
## AIC BIC logLik
## 600.0538 609.9488 -297.0269
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 4.745922 1.1218709 4.230363 0
## xe -0.189757 0.0347337 -5.463211 0
##
## Correlation:
## (Intr)
## xe 0.023
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -1.0782995 -0.3139647 -0.1042838 0.1272677 0.7869354
##
## Residual standard error: 2.647384
## Degrees of freedom: 200 total; 198 residual
```

Wow.

Hello, Dr. Revell!

ReplyDeleteI've used the pgls.Ives function with my data (and let the variation be estimated from the data itself). I performed the LRtest as described in this post so that I could obtain a pvalue for the correlation. Also, the output object provides me with estimates for both the intercept and the slope of the regression curve. But I have a doubt on how to obtain the r squared value for each regression using this function. How could I estimate it?

Thanks!