Saturday, November 11, 2017

Bivariate phylogenetic regression with known (correlated) sampling error in x & y: pgls.Ives

A number of years ago I implemented the bivariate regression model of Ives et al. (2007). This is to to fit a phylogenetic generalized least squares regression while taking into account sampling error in both x & y.

The method was kind of difficult for me to implement at the time. Consequently, when some users reported peculiar results from the function (called pgls.Ives) I was worried and added a warning message which is automatically printed with every function call (see below).

I'm still not 100% confident in my implementation - so the warning remains. Nonetheless I was playing with the function today to add some S3 methods & null hypothesis testing - and although the interface is kind of crude, I have to say, it seems to work great! (Of course, any report of issues is still welcome.)

In the following I will demo my updates to the function in which I first simulate data with sampling error in x & y under the alternative hypothesis and then test a null hypothesis H0: β1 = 0 using a LR-test.

First let's generate some data in which x and y are correlated, but in which we have simulated sampling error in both dimensions:

library(phytools)
library(MASS)
tree<-pbtree(n=200,scale=1)
x<-fastBM(tree)
y<-5+3.5*x+fastBM(tree,0.2)
Vx<-rexp(length(tree$tip))/2
Vy<-rexp(length(tree$tip))/2
Cxy=sqrt(Vx*Vy)*runif(length(tree$tip),min=-1,max=1)
names(Vx)<-names(Vy)<-names(Cxy)<-names(x)
xe<-ye<-vector()
for(i in 1:length(tree$tip)){
    temp<-mvrnorm(mu=c(x[i],y[i]), 
        Sigma=matrix(c(Vx[i],Cxy[i],Cxy[i],
        Vy[i]),2,2))
    xe[i]<-temp[1]
    ye[i]<-temp[2]
}
names(xe)<-names(x)
names(ye)<-names(y)

Not only does this model permit sampling error in x & y it also allows for this sampling error to be correlated, which is nice.

Let's fit our models:

fit0<-pgls.Ives(tree,xe,ye,Vx,Vy,Cxy,fixed.b1=0)
## 
## ---------------------------------------------------------------
## | **Warning:                                                  |
## |   User reports suggest that this method may frequently      |
## |   fail to find the ML solution. Please use with caution.    |
## ---------------------------------------------------------------
fit0
## 
## Result PGLS with sampling error in x & y
##    (based on Ives et al. 2007):
## 
## Response: y
##   Intercept beta[1]
## y  3.082196       0
## 
## Summary of ML estimated parameters:
##     sigma^2         a      log(L)
## y 13.379053  3.082196 -657.494854
## x  1.402936 -0.764788            
## ---------
## 
## R thinks it has converged.
fitb1<-pgls.Ives(tree,xe,ye,Vx,Vy,Cxy)
## 
## ---------------------------------------------------------------
## | **Warning:                                                  |
## |   User reports suggest that this method may frequently      |
## |   fail to find the ML solution. Please use with caution.    |
## ---------------------------------------------------------------
fitb1
## 
## Result PGLS with sampling error in x & y
##    (based on Ives et al. 2007):
## 
## Response: y
##   Intercept beta[1]
## y  5.694631  3.4317
## 
## Summary of ML estimated parameters:
##    sigma^2         a      log(L)
## y 0.383931  3.079038 -587.442054
## x 1.180562 -0.762186            
## ---------
## 
## R thinks it has converged.

Note that the parameter estimates for this model are very close to the generating values of β = [5, 3.5], much closer than if we use PGLS ignoring sampling error:

library(nlme)
gls(ye~xe,data=data.frame(xe=xe,ye=ye),
    correlation=corBrownian(1,tree),method="ML")
## Generalized least squares fit by maximum likelihood
##   Model: ye ~ xe 
##   Data: data.frame(xe = xe, ye = ye) 
##   Log-likelihood: -406.6206
## 
## Coefficients:
## (Intercept)          xe 
##   3.5362945   0.6539054 
## 
## Correlation Structure: corBrownian
##  Formula: ~1 
##  Parameter estimate(s):
## numeric(0)
## Degrees of freedom: 200 total; 198 residual
## Residual standard error: 4.579274

We can compare these two models - one in which we forced β1 to be equal to 1.0 - using a LR-test as follows:

library(lmtest)
lrtest(fit0,fitb1)
## Likelihood ratio test
## 
## Model 1: fit0
## Model 2: fitb1
##   #Df  LogLik Df  Chisq Pr(>Chisq)    
## 1   4 -657.49                         
## 2   5 -587.44  1 140.11  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

With this method if we fix Vx, Vy, and Cxy to vectors of 0s we will (i.e., we should) obtain the same result as with standard PGLS, which is also good to know.

Vx[]<-Vy[]<-Cxy[]<-0
pgls.Ives(tree,xe,ye,Vx,Vy,Cxy)
## 
## ---------------------------------------------------------------
## | **Warning:                                                  |
## |   User reports suggest that this method may frequently      |
## |   fail to find the ML solution. Please use with caution.    |
## ---------------------------------------------------------------
## 
## Result PGLS with sampling error in x & y
##    (based on Ives et al. 2007):
## 
## Response: y
##   Intercept   beta[1]
## y  3.512046 0.6537344
## 
## Summary of ML estimated parameters:
##    sigma^2         a      log(L)
## y 20.99469  3.009787 -772.538868
## x 13.95150 -0.768292            
## ---------
## 
## R thinks it has converged.

Finally, let's generate some data under the null hypothesis of β1 = 0.0:

y<-5+fastBM(tree,0.2)
Vx<-rexp(length(tree$tip))/2
Vy<-rexp(length(tree$tip))/2
Cxy=sqrt(Vx*Vy)*runif(length(tree$tip),min=-1,max=1)
names(Vx)<-names(Vy)<-names(Cxy)<-names(x)
xe<-ye<-vector()
for(i in 1:length(tree$tip)){
    temp<-mvrnorm(mu=c(x[i],y[i]), 
        Sigma=matrix(c(Vx[i],Cxy[i],Cxy[i],
        Vy[i]),2,2))
    xe[i]<-temp[1]
    ye[i]<-temp[2]
}
names(xe)<-names(x)
names(ye)<-names(y)

Fit our models & compare them using a LR-test again:

fit0<-pgls.Ives(tree,xe,ye,Vx,Vy,Cxy,fixed.b1=0)
## 
## ---------------------------------------------------------------
## | **Warning:                                                  |
## |   User reports suggest that this method may frequently      |
## |   fail to find the ML solution. Please use with caution.    |
## ---------------------------------------------------------------
fit0
## 
## Result PGLS with sampling error in x & y
##    (based on Ives et al. 2007):
## 
## Response: y
##   Intercept beta[1]
## y  4.905824       0
## 
## Summary of ML estimated parameters:
##    sigma^2         a      log(L)
## y 0.912601  4.905824 -477.861006
## x 1.261848 -0.730253            
## ---------
## 
## R thinks it has converged.
fitb1<-pgls.Ives(tree,xe,ye,Vx,Vy,Cxy)
## 
## ---------------------------------------------------------------
## | **Warning:                                                  |
## |   User reports suggest that this method may frequently      |
## |   fail to find the ML solution. Please use with caution.    |
## ---------------------------------------------------------------
fitb1
## 
## Result PGLS with sampling error in x & y
##    (based on Ives et al. 2007):
## 
## Response: y
##   Intercept     beta[1]
## y   4.88616 -0.02791613
## 
## Summary of ML estimated parameters:
##    sigma^2         a      log(L)
## y 0.912113  4.906315 -477.831083
## x 1.264841 -0.721985            
## ---------
## 
## R thinks it has converged.
lrtest(fit0,fitb1)
## Likelihood ratio test
## 
## Model 1: fit0
## Model 2: fitb1
##   #Df  LogLik Df  Chisq Pr(>Chisq)
## 1   4 -477.86                     
## 2   5 -477.83  1 0.0598     0.8067

Ignoring within-species sampling error & we get a highly significant result!

fit.gls<-gls(ye~xe,data=data.frame(xe=xe,ye=ye),
    correlation=corBrownian(1,tree),method="ML")
fit.gls
## Generalized least squares fit by maximum likelihood
##   Model: ye ~ xe 
##   Data: data.frame(xe = xe, ye = ye) 
##   Log-likelihood: -297.0269
## 
## Coefficients:
## (Intercept)          xe 
##   4.7459216  -0.1897573 
## 
## Correlation Structure: corBrownian
##  Formula: ~1 
##  Parameter estimate(s):
## numeric(0)
## Degrees of freedom: 200 total; 198 residual
## Residual standard error: 2.647384
summary(fit.gls)
## Generalized least squares fit by maximum likelihood
##   Model: ye ~ xe 
##   Data: data.frame(xe = xe, ye = ye) 
##        AIC      BIC    logLik
##   600.0538 609.9488 -297.0269
## 
## Correlation Structure: corBrownian
##  Formula: ~1 
##  Parameter estimate(s):
## numeric(0)
## 
## Coefficients:
##                 Value Std.Error   t-value p-value
## (Intercept)  4.745922 1.1218709  4.230363       0
## xe          -0.189757 0.0347337 -5.463211       0
## 
##  Correlation: 
##    (Intr)
## xe 0.023 
## 
## Standardized residuals:
##        Min         Q1        Med         Q3        Max 
## -1.0782995 -0.3139647 -0.1042838  0.1272677  0.7869354 
## 
## Residual standard error: 2.647384 
## Degrees of freedom: 200 total; 198 residual

Wow.

1 comment:

  1. Hello, Dr. Revell!

    I've used the pgls.Ives function with my data (and let the variation be estimated from the data itself). I performed the LRtest as described in this post so that I could obtain a pvalue for the correlation. Also, the output object provides me with estimates for both the intercept and the slope of the regression curve. But I have a doubt on how to obtain the r squared value for each regression using this function. How could I estimate it?

    Thanks!

    ReplyDelete

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.