In the end, I did this the old fashioned way, and it seems to work (at least, it converges on standard PGLS when sampling error is zero; and it recovers estimates close to the generating parameter values when sampling error is included for in simulation and accounted for during estimation). Unfortunately, I have so far only programmed bivariate regression. It is theoretically straightforward to extend to multivariable regression, with the caveat that optimizing the likelihood function requires inversion of a covariance matrix whose dimensions scale with

*N*× (

*p*+1) for

*p*independent variables. The optimization is also multidimensional because we simultaneously need to maximize the likelihood the for the rates of evolution in

*x*and

*y*(σ

_{x}

^{2}, σ

_{y}

^{2}), the regression slope (

*b*

_{1}) and the states at the root for

*x*and

*y*(

**a**). We don't need to separately estimate

*b*

_{0}- we can just get it from the root values and the slope.

Code for the function is here. Let's try it:

> # require dependencies and load source

> require(phytools)

> source("pgls.Ives.R")

> # simulate a phylogeny

> tree<-pbtree(n=200,scale=1)

> # simulate under some arbitrary regression model

> x<-fastBM(tree)

> y<-5+3.5*x+fastBM(tree,0.2)

> # create vectors of zeroes for the "no error" case

> Vx0<-Vy0<-Cxy0<-rep(0,length(tree$tip))

> names(Vx0)<-names(Vy0)<-names(Cxy0)<-tree$tip.label

> # fit the no error model with simulated data

> res1<-pgls.Ives(tree,x,y,Vx0,Vy0,Cxy0)

> res1

$beta

[1] 4.874868 3.392148

$sig2x

[1] 0.9823115

$sig2y

[1] 1.008072

$a

[1] -0.2179666 4.1354924

$logL

[1] -224.2079

> # for reference, we can also fit this with gls()

> res2<-gls(y~x,data.frame(x,y),correlation=corBrownian(1,tree))

> res2

Generalized least squares fit by REML

Model: y ~ x

Data: data.frame(x, y)

Log-restricted-likelihood: -115.1862

Coefficients:

(Intercept) x

4.874825 3.392136

Correlation Structure: corBrownian

Formula: ~1

Parameter estimate(s):

numeric(0)

Degrees of freedom: 200 total; 198 residual

Residual standard error: 1.00909

> # simulate sampling variances and covariances

> Vx<-rexp(length(tree$tip))/2

> Vy<-rexp(length(tree$tip))/2

> Cxy=sqrt(Vx*Vy)*runif(length(tree$tip),min=-1,max=1)

> names(Vx)<-names(Vy)<-names(Cxy)<-names(x)

> # simulate data with sampling error

> xe<-ye<-vector()

> for(i in 1:length(tree$tip)){

temp<-mvrnorm(mu=c(x[i],y[i]),

Sigma=matrix(c(Vx[i],Cxy[i],Cxy[i],Vy[i]),2,2))

xe[i]<-temp[1]; ye[i]<-temp[2]

}

> names(xe)<-names(x)

> names(ye)<-names(y)

> # fit no error model to data with error

> res3<-pgls.Ives(tree,xe,ye,Vx0,Vy0,Cxy0)

> res3

$beta

[1] 4.1621231 0.2000734

$sig2x

[1] 27.1866

$sig2y

[1] 24.35597

$a

[1] -0.2924528 4.1036111

$logL

[1] -874.7336

> # fit error model

> res4<-pgls.Ives(tree,xe,ye,Vx,Vy,Cxy)

> res4

$beta

[1] 5.133185 3.485432

$sig2x

[1] 0.9533415

$sig2y

[1] 0.830365

$a

[1] -0.2760699 4.1709624

$logL

[1] -568.1378

Obviously, in this case our estimate - particularly of the regression slope - is quite bad when sampling error is ignored; but we do much better with the method of Ives et al. Cool!

Hi Liam,

ReplyDeleteAny chance you've programmed multivariate regression for pgls.Ives yet? I'm not aware of another package that has a similar function to include sampling error.

Thanks!