In the following demo I show one way to do phylogenetic generalized least squares (PGLS) assuming a Brownian motion model for the variance-covariance structure of the residual error, for conditions in which there is measurement error in the estimation of y, the dependent variable in our model, but not our xs. This was motivated by a question from a UMass-Boston graduate student based on the phylogenetic (generalized) ANOVA - so this is a case in which there might be easily quantifiable error in y but not in our factor or factors.
The way I have done this is not by writing a new corStruct
-
although one would perhaps ideally like to this.
First, this is the likelihood functin that we are going to optimize. It also returns
the fitted model & likelihood when opt=FALSE
.
lk<-function(sig2,y,X,C,v=NULL,opt=TRUE){
n<-nrow(C)
if(is.null(v)) v<-rep(0,n)
V<-sig2*C+diag(v)
beta<-solve(t(X)%*%solve(V)%*%X)%*%(t(X)%*%solve(V)%*%y)
logL<--(1/2)*t(y-X%*%beta)%*%solve(V)%*%(y-X%*%beta)-
(1/2)*determinant(V,logarithm=TRUE)$modulus[1]-
(n/2)*log(2*pi)
if(opt) -logL[1,1] else list(beta=beta[,1],sig2e=sig2,logL=logL[1,1])
}
Now, next, I start with an illustrative demo using multivariable regressin to show
that this function will return the same fitted model (when no sampling error within
species is assumed) as gls(...,correlation=corBrownian(...))
. Here I
go:
## load libraries
library(phytools)
library(nlme)
## simulate tree & data
tree<-pbtree(n=100,scale=1)
X<-fastBM(tree,nsim=2)
colnames(X)<-c("x1","x2")
y<-cbind(rep(1,Ntip(tree)),X)%*%c(1,2,3)+fastBM(tree)
## first, fit the model using gls
fit.gls<-gls(y~x1+x2,data=data.frame(y,X),correlation=corBrownian(1,tree),
method="ML")
fit.gls
## Generalized least squares fit by maximum likelihood
## Model: y ~ x1 + x2
## Data: data.frame(y, X)
## Log-likelihood: -47.8675
##
## Coefficients:
## (Intercept) x1 x2
## 0.9532797 1.9886343 3.0334548
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
## Degrees of freedom: 100 total; 97 residual
## Residual standard error: 0.9198236
## now fit it using our custom function
## the interval for optimize is specified arbitrarily
fit.lk<-optimize(lk,c(0,1000),y=y,X=cbind(rep(1,Ntip(tree)),X),C=vcv(tree))
fitted<-lk(fit.lk$minimum,y=y,X=cbind(rep(1,Ntip(tree)),X),C=vcv(tree),
opt=FALSE)
fitted
## $beta
## x1 x2
## 0.9532797 1.9886343 3.0334548
##
## $sig2e
## [1] 0.8460792
##
## $logL
## [1] -47.8675
So, here we can see that it works when we do not have sampling error in y. Next, let's simulate known sampling error and try again:
## these will be our within-species sampling variances
v<-setNames(rexp(n=Ntip(tree)),tree$tip.label)
ye<-setNames(sampleFrom(xbar=y,xvar=v,n=rep(1,length(y))),rownames(y))
fit.lk<-optimize(lk,c(0,1000),y=ye,X=cbind(rep(1,Ntip(tree)),X),
C=vcv(tree),v=v)
fitted<-lk(fit.lk$minimum,y=ye,X=cbind(rep(1,Ntip(tree)),X),C=vcv(tree),
v=v,opt=FALSE)
fitted
## $beta
## x1 x2
## 0.863873 2.039439 2.969356
##
## $sig2e
## [1] 0.6355355
##
## $logL
## [1] -138.4668
## compare to:
gls(ye~x1+x2,data=data.frame(ye,X),correlation=corBrownian(1,tree),method="ML")
## Generalized least squares fit by maximum likelihood
## Model: ye ~ x1 + x2
## Data: data.frame(ye, X)
## Log-likelihood: -231.8525
##
## Coefficients:
## (Intercept) x1 x2
## 1.508517 1.947806 4.554398
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
## Degrees of freedom: 100 total; 97 residual
## Residual standard error: 5.790837
Now, it also turns out that after we have optimized sig2e
we can
actually coerce gls
into giving us the correct fitted model &
likelihood. Here, I do this by distorting the edge lengths of our tree to
take into account the fitted sig2e
and within-species errors in
y.
tt<-tree
tt$edge.length<-tt$edge.length*fitted$sig2e
for(i in 1:length(v)){
tip<-which(tt$tip.label==names(v)[i])
ii<-which(tt$edge[,2]==tip)
tt$edge.length[ii]<-tt$edge.length[ii]+v[i]
}
vv<-diag(vcv(tt))
w<-varFixed(~vv)
fit.gls<-gls(ye~x1+x2,data=data.frame(ye,X),correlation=corBrownian(1,tt),method="ML",weights=w)
fit.gls
## Generalized least squares fit by maximum likelihood
## Model: ye ~ x1 + x2
## Data: data.frame(ye, X)
## Log-likelihood: -138.4384
##
## Coefficients:
## (Intercept) x1 x2
## 0.863873 2.039439 2.969356
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
## Variance function:
## Structure: fixed weights
## Formula: ~vv
## Degrees of freedom: 100 total; 97 residual
## Residual standard error: 1.0169
## compare to:
fitted
## $beta
## x1 x2
## 0.863873 2.039439 2.969356
##
## $sig2e
## [1] 0.6355355
##
## $logL
## [1] -138.4668
Now, here's a similar demo using the phylogenetic (generalized) ANOVA:
## evolve a factor on the tree
Q<-matrix(c(-2,1,1,1,-2,1,1,1,-2),3,3)
colnames(Q)<-rownames(Q)<-letters[1:3]
x<-as.factor(sim.history(tree,Q)$states)
## Done simulation(s).
y<-fastBM(tree)
y[x=="a"]<-y[x=="a"]+1
y[x=="b"]<-y[x=="b"]+2
y[x=="c"]<-y[x=="c"]+3
fit.gls<-gls(y~x,data=data.frame(y,x),correlation=corBrownian(1,tree),method="ML")
fit.gls
## Generalized least squares fit by maximum likelihood
## Model: y ~ x
## Data: data.frame(y, x)
## Log-likelihood: -58.60169
##
## Coefficients:
## (Intercept) xb xc
## 0.8929341 0.8188519 1.8667981
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
## Degrees of freedom: 100 total; 97 residual
## Residual standard error: 1.024053
summary(fit.gls)
## Generalized least squares fit by maximum likelihood
## Model: y ~ x
## Data: data.frame(y, x)
## AIC BIC logLik
## 125.2034 135.6241 -58.60169
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 0.8929341 0.3801839 2.348690 0.0209
## xb 0.8188519 0.1411306 5.802084 0.0000
## xc 1.8667981 0.1658942 11.252944 0.0000
##
## Correlation:
## (Intr) xb
## xb -0.181
## xc -0.169 0.431
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -3.51162932 -1.05473759 -0.05870776 0.70967087 2.99993479
##
## Residual standard error: 1.024053
## Degrees of freedom: 100 total; 97 residual
X<-model.matrix(~x)
fit.lk<-optimize(lk,c(0,1000),y=y,X=X,C=vcv(tree))
fitted<-lk(fit.lk$minimum,y=y,X=X,C=vcv(tree),opt=FALSE)
fitted
## $beta
## (Intercept) xb xc
## 0.8929341 0.8188519 1.8667981
##
## $sig2e
## [1] 1.048674
##
## $logL
## [1] -58.60169
## now with sampling error in y
ye<-setNames(sampleFrom(xbar=y,xvar=v,n=rep(1,length(y))),rownames(y))
## fit model
fit.lk<-optimize(lk,c(0,1000),y=ye,X=X,C=vcv(tree),v=v)
fitted<-lk(fit.lk$minimum,y=ye,X=X,C=vcv(tree),v=v,opt=FALSE)
fitted
## $beta
## (Intercept) xb xc
## 0.7462831 1.2109657 1.9742749
##
## $sig2e
## [1] 1.338498
##
## $logL
## [1] -136.757
## coerce tree again:
tt<-tree
tt$edge.length<-tt$edge.length*fitted$sig2e
for(i in 1:length(v)){
tip<-which(tt$tip.label==names(v)[i])
ii<-which(tt$edge[,2]==tip)
tt$edge.length[ii]<-tt$edge.length[ii]+v[i]
}
vv<-diag(vcv(tt))
w<-varFixed(~vv)
fit.gls<-gls(ye~x,data=data.frame(ye,x),correlation=corBrownian(1,tt),method="ML",weights=w)
fit.gls
## Generalized least squares fit by maximum likelihood
## Model: ye ~ x
## Data: data.frame(ye, x)
## Log-likelihood: -134.6691
##
## Coefficients:
## (Intercept) xb xc
## 0.7462831 1.2109657 1.9742749
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
## Variance function:
## Structure: fixed weights
## Formula: ~vv
## Degrees of freedom: 100 total; 97 residual
## Residual standard error: 0.8591587
summary(fit.gls)
## Generalized least squares fit by maximum likelihood
## Model: ye ~ x
## Data: data.frame(ye, x)
## AIC BIC logLik
## 277.3383 287.759 -134.6691
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
## Variance function:
## Structure: fixed weights
## Formula: ~vv
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 0.7462831 0.3900422 1.913339 0.0587
## xb 1.2109657 0.2408905 5.027037 0.0000
## xc 1.9742749 0.3019824 6.537715 0.0000
##
## Correlation:
## (Intr) xb
## xb -0.295
## xc -0.260 0.348
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -3.2076462 -0.8804235 -0.1460402 0.8224632 3.4883858
##
## Residual standard error: 0.8591587
## Degrees of freedom: 100 total; 97 residual
OK, that's it.
No comments:
Post a Comment
Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.