A number of years ago I implemented the bivariate regression model of Ives et al. (2007). This is to to fit a phylogenetic generalized least squares regression while taking into account sampling error in both x & y.
The method was kind of difficult for me to implement at the time.
Consequently, when some users reported peculiar results from the function
(called pgls.Ives
) I was worried and added a warning message
which is automatically printed with every function call (see below).
I'm still not 100% confident in my implementation - so the warning remains. Nonetheless I was playing with the function today to add some S3 methods & null hypothesis testing - and although the interface is kind of crude, I have to say, it seems to work great! (Of course, any report of issues is still welcome.)
In the following I will demo my updates to the function in which I first simulate data with sampling error in x & y under the alternative hypothesis and then test a null hypothesis H0: β1 = 0 using a LR-test.
First let's generate some data in which x and y are correlated, but in which we have simulated sampling error in both dimensions:
library(phytools)
library(MASS)
tree<-pbtree(n=200,scale=1)
x<-fastBM(tree)
y<-5+3.5*x+fastBM(tree,0.2)
Vx<-rexp(length(tree$tip))/2
Vy<-rexp(length(tree$tip))/2
Cxy=sqrt(Vx*Vy)*runif(length(tree$tip),min=-1,max=1)
names(Vx)<-names(Vy)<-names(Cxy)<-names(x)
xe<-ye<-vector()
for(i in 1:length(tree$tip)){
temp<-mvrnorm(mu=c(x[i],y[i]),
Sigma=matrix(c(Vx[i],Cxy[i],Cxy[i],
Vy[i]),2,2))
xe[i]<-temp[1]
ye[i]<-temp[2]
}
names(xe)<-names(x)
names(ye)<-names(y)
Not only does this model permit sampling error in x & y it also allows for this sampling error to be correlated, which is nice.
Let's fit our models:
fit0<-pgls.Ives(tree,xe,ye,Vx,Vy,Cxy,fixed.b1=0)
##
## ---------------------------------------------------------------
## | **Warning: |
## | User reports suggest that this method may frequently |
## | fail to find the ML solution. Please use with caution. |
## ---------------------------------------------------------------
fit0
##
## Result PGLS with sampling error in x & y
## (based on Ives et al. 2007):
##
## Response: y
## Intercept beta[1]
## y 3.082196 0
##
## Summary of ML estimated parameters:
## sigma^2 a log(L)
## y 13.379053 3.082196 -657.494854
## x 1.402936 -0.764788
## ---------
##
## R thinks it has converged.
fitb1<-pgls.Ives(tree,xe,ye,Vx,Vy,Cxy)
##
## ---------------------------------------------------------------
## | **Warning: |
## | User reports suggest that this method may frequently |
## | fail to find the ML solution. Please use with caution. |
## ---------------------------------------------------------------
fitb1
##
## Result PGLS with sampling error in x & y
## (based on Ives et al. 2007):
##
## Response: y
## Intercept beta[1]
## y 5.694631 3.4317
##
## Summary of ML estimated parameters:
## sigma^2 a log(L)
## y 0.383931 3.079038 -587.442054
## x 1.180562 -0.762186
## ---------
##
## R thinks it has converged.
Note that the parameter estimates for this model are very close to the generating values of β = [5, 3.5], much closer than if we use PGLS ignoring sampling error:
library(nlme)
gls(ye~xe,data=data.frame(xe=xe,ye=ye),
correlation=corBrownian(1,tree),method="ML")
## Generalized least squares fit by maximum likelihood
## Model: ye ~ xe
## Data: data.frame(xe = xe, ye = ye)
## Log-likelihood: -406.6206
##
## Coefficients:
## (Intercept) xe
## 3.5362945 0.6539054
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
## Degrees of freedom: 200 total; 198 residual
## Residual standard error: 4.579274
We can compare these two models - one in which we forced β1 to be equal to 1.0 - using a LR-test as follows:
library(lmtest)
lrtest(fit0,fitb1)
## Likelihood ratio test
##
## Model 1: fit0
## Model 2: fitb1
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 4 -657.49
## 2 5 -587.44 1 140.11 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
With this method if we fix Vx
, Vy
, and
Cxy
to vectors of 0s we will (i.e., we should) obtain the
same result as with standard PGLS, which is also good to know.
Vx[]<-Vy[]<-Cxy[]<-0
pgls.Ives(tree,xe,ye,Vx,Vy,Cxy)
##
## ---------------------------------------------------------------
## | **Warning: |
## | User reports suggest that this method may frequently |
## | fail to find the ML solution. Please use with caution. |
## ---------------------------------------------------------------
##
## Result PGLS with sampling error in x & y
## (based on Ives et al. 2007):
##
## Response: y
## Intercept beta[1]
## y 3.512046 0.6537344
##
## Summary of ML estimated parameters:
## sigma^2 a log(L)
## y 20.99469 3.009787 -772.538868
## x 13.95150 -0.768292
## ---------
##
## R thinks it has converged.
Finally, let's generate some data under the null hypothesis of β1 = 0.0:
y<-5+fastBM(tree,0.2)
Vx<-rexp(length(tree$tip))/2
Vy<-rexp(length(tree$tip))/2
Cxy=sqrt(Vx*Vy)*runif(length(tree$tip),min=-1,max=1)
names(Vx)<-names(Vy)<-names(Cxy)<-names(x)
xe<-ye<-vector()
for(i in 1:length(tree$tip)){
temp<-mvrnorm(mu=c(x[i],y[i]),
Sigma=matrix(c(Vx[i],Cxy[i],Cxy[i],
Vy[i]),2,2))
xe[i]<-temp[1]
ye[i]<-temp[2]
}
names(xe)<-names(x)
names(ye)<-names(y)
Fit our models & compare them using a LR-test again:
fit0<-pgls.Ives(tree,xe,ye,Vx,Vy,Cxy,fixed.b1=0)
##
## ---------------------------------------------------------------
## | **Warning: |
## | User reports suggest that this method may frequently |
## | fail to find the ML solution. Please use with caution. |
## ---------------------------------------------------------------
fit0
##
## Result PGLS with sampling error in x & y
## (based on Ives et al. 2007):
##
## Response: y
## Intercept beta[1]
## y 4.905824 0
##
## Summary of ML estimated parameters:
## sigma^2 a log(L)
## y 0.912601 4.905824 -477.861006
## x 1.261848 -0.730253
## ---------
##
## R thinks it has converged.
fitb1<-pgls.Ives(tree,xe,ye,Vx,Vy,Cxy)
##
## ---------------------------------------------------------------
## | **Warning: |
## | User reports suggest that this method may frequently |
## | fail to find the ML solution. Please use with caution. |
## ---------------------------------------------------------------
fitb1
##
## Result PGLS with sampling error in x & y
## (based on Ives et al. 2007):
##
## Response: y
## Intercept beta[1]
## y 4.88616 -0.02791613
##
## Summary of ML estimated parameters:
## sigma^2 a log(L)
## y 0.912113 4.906315 -477.831083
## x 1.264841 -0.721985
## ---------
##
## R thinks it has converged.
lrtest(fit0,fitb1)
## Likelihood ratio test
##
## Model 1: fit0
## Model 2: fitb1
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 4 -477.86
## 2 5 -477.83 1 0.0598 0.8067
Ignoring within-species sampling error & we get a highly significant result!
fit.gls<-gls(ye~xe,data=data.frame(xe=xe,ye=ye),
correlation=corBrownian(1,tree),method="ML")
fit.gls
## Generalized least squares fit by maximum likelihood
## Model: ye ~ xe
## Data: data.frame(xe = xe, ye = ye)
## Log-likelihood: -297.0269
##
## Coefficients:
## (Intercept) xe
## 4.7459216 -0.1897573
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
## Degrees of freedom: 200 total; 198 residual
## Residual standard error: 2.647384
summary(fit.gls)
## Generalized least squares fit by maximum likelihood
## Model: ye ~ xe
## Data: data.frame(xe = xe, ye = ye)
## AIC BIC logLik
## 600.0538 609.9488 -297.0269
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 4.745922 1.1218709 4.230363 0
## xe -0.189757 0.0347337 -5.463211 0
##
## Correlation:
## (Intr)
## xe 0.023
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -1.0782995 -0.3139647 -0.1042838 0.1272677 0.7869354
##
## Residual standard error: 2.647384
## Degrees of freedom: 200 total; 198 residual
Wow.
Hello, Dr. Revell!
ReplyDeleteI've used the pgls.Ives function with my data (and let the variation be estimated from the data itself). I performed the LRtest as described in this post so that I could obtain a pvalue for the correlation. Also, the output object provides me with estimates for both the intercept and the slope of the regression curve. But I have a doubt on how to obtain the r squared value for each regression using this function. How could I estimate it?
Thanks!