A number of years ago I implemented the bivariate regression model of Ives et al. (2007). This is to to fit a phylogenetic generalized least squares regression while taking into account sampling error in both x & y.
The method was kind of difficult for me to implement at the time.
Consequently, when some users reported peculiar results from the function
(called pgls.Ives) I was worried and added a warning message
which is automatically printed with every function call (see below).
I'm still not 100% confident in my implementation - so the warning remains. Nonetheless I was playing with the function today to add some S3 methods & null hypothesis testing - and although the interface is kind of crude, I have to say, it seems to work great! (Of course, any report of issues is still welcome.)
In the following I will demo my updates to the function in which I first simulate data with sampling error in x & y under the alternative hypothesis and then test a null hypothesis H0: β1 = 0 using a LR-test.
First let's generate some data in which x and y are correlated, but in which we have simulated sampling error in both dimensions:
library(phytools)
library(MASS)
tree<-pbtree(n=200,scale=1)
x<-fastBM(tree)
y<-5+3.5*x+fastBM(tree,0.2)
Vx<-rexp(length(tree$tip))/2
Vy<-rexp(length(tree$tip))/2
Cxy=sqrt(Vx*Vy)*runif(length(tree$tip),min=-1,max=1)
names(Vx)<-names(Vy)<-names(Cxy)<-names(x)
xe<-ye<-vector()
for(i in 1:length(tree$tip)){
temp<-mvrnorm(mu=c(x[i],y[i]),
Sigma=matrix(c(Vx[i],Cxy[i],Cxy[i],
Vy[i]),2,2))
xe[i]<-temp[1]
ye[i]<-temp[2]
}
names(xe)<-names(x)
names(ye)<-names(y)
Not only does this model permit sampling error in x & y it also allows for this sampling error to be correlated, which is nice.
Let's fit our models:
fit0<-pgls.Ives(tree,xe,ye,Vx,Vy,Cxy,fixed.b1=0)
##
## ---------------------------------------------------------------
## | **Warning: |
## | User reports suggest that this method may frequently |
## | fail to find the ML solution. Please use with caution. |
## ---------------------------------------------------------------
fit0
##
## Result PGLS with sampling error in x & y
## (based on Ives et al. 2007):
##
## Response: y
## Intercept beta[1]
## y 3.082196 0
##
## Summary of ML estimated parameters:
## sigma^2 a log(L)
## y 13.379053 3.082196 -657.494854
## x 1.402936 -0.764788
## ---------
##
## R thinks it has converged.
fitb1<-pgls.Ives(tree,xe,ye,Vx,Vy,Cxy)
##
## ---------------------------------------------------------------
## | **Warning: |
## | User reports suggest that this method may frequently |
## | fail to find the ML solution. Please use with caution. |
## ---------------------------------------------------------------
fitb1
##
## Result PGLS with sampling error in x & y
## (based on Ives et al. 2007):
##
## Response: y
## Intercept beta[1]
## y 5.694631 3.4317
##
## Summary of ML estimated parameters:
## sigma^2 a log(L)
## y 0.383931 3.079038 -587.442054
## x 1.180562 -0.762186
## ---------
##
## R thinks it has converged.
Note that the parameter estimates for this model are very close to the generating values of β = [5, 3.5], much closer than if we use PGLS ignoring sampling error:
library(nlme)
gls(ye~xe,data=data.frame(xe=xe,ye=ye),
correlation=corBrownian(1,tree),method="ML")
## Generalized least squares fit by maximum likelihood
## Model: ye ~ xe
## Data: data.frame(xe = xe, ye = ye)
## Log-likelihood: -406.6206
##
## Coefficients:
## (Intercept) xe
## 3.5362945 0.6539054
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
## Degrees of freedom: 200 total; 198 residual
## Residual standard error: 4.579274
We can compare these two models - one in which we forced β1 to be equal to 1.0 - using a LR-test as follows:
library(lmtest)
lrtest(fit0,fitb1)
## Likelihood ratio test
##
## Model 1: fit0
## Model 2: fitb1
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 4 -657.49
## 2 5 -587.44 1 140.11 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
With this method if we fix Vx, Vy, and
Cxy to vectors of 0s we will (i.e., we should) obtain the
same result as with standard PGLS, which is also good to know.
Vx[]<-Vy[]<-Cxy[]<-0
pgls.Ives(tree,xe,ye,Vx,Vy,Cxy)
##
## ---------------------------------------------------------------
## | **Warning: |
## | User reports suggest that this method may frequently |
## | fail to find the ML solution. Please use with caution. |
## ---------------------------------------------------------------
##
## Result PGLS with sampling error in x & y
## (based on Ives et al. 2007):
##
## Response: y
## Intercept beta[1]
## y 3.512046 0.6537344
##
## Summary of ML estimated parameters:
## sigma^2 a log(L)
## y 20.99469 3.009787 -772.538868
## x 13.95150 -0.768292
## ---------
##
## R thinks it has converged.
Finally, let's generate some data under the null hypothesis of β1 = 0.0:
y<-5+fastBM(tree,0.2)
Vx<-rexp(length(tree$tip))/2
Vy<-rexp(length(tree$tip))/2
Cxy=sqrt(Vx*Vy)*runif(length(tree$tip),min=-1,max=1)
names(Vx)<-names(Vy)<-names(Cxy)<-names(x)
xe<-ye<-vector()
for(i in 1:length(tree$tip)){
temp<-mvrnorm(mu=c(x[i],y[i]),
Sigma=matrix(c(Vx[i],Cxy[i],Cxy[i],
Vy[i]),2,2))
xe[i]<-temp[1]
ye[i]<-temp[2]
}
names(xe)<-names(x)
names(ye)<-names(y)
Fit our models & compare them using a LR-test again:
fit0<-pgls.Ives(tree,xe,ye,Vx,Vy,Cxy,fixed.b1=0)
##
## ---------------------------------------------------------------
## | **Warning: |
## | User reports suggest that this method may frequently |
## | fail to find the ML solution. Please use with caution. |
## ---------------------------------------------------------------
fit0
##
## Result PGLS with sampling error in x & y
## (based on Ives et al. 2007):
##
## Response: y
## Intercept beta[1]
## y 4.905824 0
##
## Summary of ML estimated parameters:
## sigma^2 a log(L)
## y 0.912601 4.905824 -477.861006
## x 1.261848 -0.730253
## ---------
##
## R thinks it has converged.
fitb1<-pgls.Ives(tree,xe,ye,Vx,Vy,Cxy)
##
## ---------------------------------------------------------------
## | **Warning: |
## | User reports suggest that this method may frequently |
## | fail to find the ML solution. Please use with caution. |
## ---------------------------------------------------------------
fitb1
##
## Result PGLS with sampling error in x & y
## (based on Ives et al. 2007):
##
## Response: y
## Intercept beta[1]
## y 4.88616 -0.02791613
##
## Summary of ML estimated parameters:
## sigma^2 a log(L)
## y 0.912113 4.906315 -477.831083
## x 1.264841 -0.721985
## ---------
##
## R thinks it has converged.
lrtest(fit0,fitb1)
## Likelihood ratio test
##
## Model 1: fit0
## Model 2: fitb1
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 4 -477.86
## 2 5 -477.83 1 0.0598 0.8067
Ignoring within-species sampling error & we get a highly significant result!
fit.gls<-gls(ye~xe,data=data.frame(xe=xe,ye=ye),
correlation=corBrownian(1,tree),method="ML")
fit.gls
## Generalized least squares fit by maximum likelihood
## Model: ye ~ xe
## Data: data.frame(xe = xe, ye = ye)
## Log-likelihood: -297.0269
##
## Coefficients:
## (Intercept) xe
## 4.7459216 -0.1897573
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
## Degrees of freedom: 200 total; 198 residual
## Residual standard error: 2.647384
summary(fit.gls)
## Generalized least squares fit by maximum likelihood
## Model: ye ~ xe
## Data: data.frame(xe = xe, ye = ye)
## AIC BIC logLik
## 600.0538 609.9488 -297.0269
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 4.745922 1.1218709 4.230363 0
## xe -0.189757 0.0347337 -5.463211 0
##
## Correlation:
## (Intr)
## xe 0.023
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -1.0782995 -0.3139647 -0.1042838 0.1272677 0.7869354
##
## Residual standard error: 2.647384
## Degrees of freedom: 200 total; 198 residual
Wow.
Hello, Dr. Revell!
ReplyDeleteI've used the pgls.Ives function with my data (and let the variation be estimated from the data itself). I performed the LRtest as described in this post so that I could obtain a pvalue for the correlation. Also, the output object provides me with estimates for both the intercept and the slope of the regression curve. But I have a doubt on how to obtain the r squared value for each regression using this function. How could I estimate it?
Thanks!