Wednesday, October 16, 2013

Optimizing pgls.Ives with Pagel's λ

Today I got the following interesting user request:

"I have been playing around with the function pgls.Ives. Is there anyway to accomodate the corPagel correlation structures into this?"

This is not possible, actually. corStruct objects (such as an object of class "corPagel") do not permit (so far as I know) error in both x & y which is a component of the pgls.Ives model. However we can, with a little pain, optimize Pagel's λ model using pgls.Ives - just in a slightly roundabout way. We do this by building a simple wrapper around pgls.Ives that spits out the log-likelihood conditioned on a specific input λ, then we optimize for λ. Here's a demo:

## load packages & simulate data
require(phytools)
require(geiger)

# simulate tree & data under the regression model
tree<-pbtree(n=50)
xhat<-fastBM(tree)
yhat<-0.75*xhat+0.2*fastBM(transform(tree,lambda=0.5))

# simulate individual data for x & y
x<-sampleFrom(xhat,randn=c(1,10))
n<-summary(as.factor(names(x)))
y<-sampleFrom(yhat,n=n[names(yhat)])
Now let's fit the model using λ:
## brute force grid search optimization
lambda<-seq(0.05,1,0.05)
logL<-sapply(lambda,function(l,tree,x,y)
  pgls.Ives(transform(tree,lambda=l),x,y)$logL,tree=tree,
  x=x,y=y)
ii<-which(logL==max(logL))
plot(lambda,logL,type="b")
lines(c(lambda[ii],lambda[ii]),c(min(logL),max(logL)),
  lty="dashed")
ml.lambda<-lambda[ii]
ml.lambda

## optimize using base function optimize
foo<-function(l,tree,x,y)
  pgls.Ives(transform(tree,lambda=l),x,y)$logL
optimize(foo,c(0,1),tree=tree,x=x,y=y,maximum=TRUE)

Let's try it:

> # simulate tree & data under the regression model
> tree<-pbtree(n=50)
> xhat<-fastBM(tree)
> yhat<-0.75*xhat+0.2*fastBM(transform(tree,lambda=0.5))
>
> # simulate individual data for x & y
> x<-sampleFrom(xhat,randn=c(1,10))
> n<-summary(as.factor(names(x)))
> y<-sampleFrom(yhat,n=n[names(yhat)])
>
> ## brute force grid search optimization
> lambda<-seq(0.05,1,0.05)
> logL<-sapply(lambda,function(l,tree,x,y)
pgls.Ives(transform(tree,lambda=l),x,y)$logL,tree=tree,
x=x,y=y)
There were 50 or more warnings (use warnings() to see the first 50)
> ii<-which(logL==max(logL))
> ml.lambda<-lambda[ii]
> ml.lambda
[1] 0.7
> plot(lambda,logL,type="b")
> lines(c(lambda[ii],lambda[ii]),c(min(logL),max(logL)),
lty="dashed")
> ## optimize using base function optimize
> foo<-function(l,tree,x,y) pgls.Ives(transform(tree,lambda=l),x,y)$logL
> optimize(foo,c(0,1),tree=tree,x=x,y=y,maximum=TRUE)
$maximum
[1] 0.6893112

$objective
[1] -134.8

There were 42 warnings (use warnings() to see them)
The warnings, in this case, are just that the sample sizes for some species are one - so the function assumes that that species has a within species variance equal to the mean variance across all species.

Note that although this works fine in principle, I have received some reports of the basic function failing to optimize for larger trees - and I believe these reports, so please use pgls.Ives with considerable caution until I work out these issues!

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.