Thursday, September 13, 2012

Estimating ancestral states under a single-optimum OU model

In a recent R-SIG-phylo thread, a R phylogenetics user inquired about how to obtain ancestral state estimates for internal nodes based on an Ornstein-Uhlenbeck model for trait evolution on the tree. They pointed out that, in principle, it should be possible to do this using ace(...,method="GLS",corStruct=corMartins(...)); unfortunately this was not producing sensible ancestral character estimates (and I wasn't able to get it to work at all, for some unknown reason).

Without judging the wisdom of this general endeavor (which I'll discuss a little at the end of this post), I provided a relatively straightforward way to do this using ace(...,method="ML") combined with geiger::fitContinuous and geiger::ouTree. Basically, we first fit our OU model using fitContinuous(...,model="OU"), then we transform the branch lengths of our tree using ouTree(...), finally we estimate ancestral states using ace in the 'ape' package or phytools::anc.ML.

Just to demo this, I'll simulate and estimate below:

> # load the required libraries
> library(phytools) # also loads ape
> library(geiger)
> # set the conditions for simulation
> N<-50; alpha<-1
> # simulate our tree
> tree<-pbtree(n=N,scale=10)
> # simulate data, including internal nodes
> # under a single-optimum OU model
> x<-fastBM(ouTree(tree,alpha),internal=TRUE)
> # fit using geiger::fitContinuous
> fit<-fitContinuous(tree,x[1:N],model="OU")$Trait1
Fitting  OU model:
> fit
$lnl
[1] -46.27052
$beta
[1] 0.7832121
$alpha
[1] 0.9540822
$convergence
...

> # estimate ancestral states
> ou<-ace(x[1:N],ouTree(tree,fit$alpha))
...
> ou$ace
         51           52           53           54
0.012733299  0.012733173  0.012733128          ...
> # or ou<-anc.ML(ouTree(tree,fit$alpha),x[1:N])
> # check how we did
> cor(x[names(ou$ace)],ou$ace)
[1] 0.8046332
> plot(x[names(ou$ace)],ou$ace,xlab="true states",ylab="estimated states")

Because these are simulated data, we can also check to see how well we did - and it turns out that we did a little bit better than we might have if we'd assumed BM as the generating model:

> bm<-ace(x[1:length(tree$tip)],tree)
> cor(x[names(bm$ace)],bm$ace)
[1] 0.6990525
> plot(x[names(bm$ace)],bm$ace,xlab="true states",ylab="estimated states")

Of course, the whole exercise might be a somewhat questionable endeavor. This is because in a single optimum OU model we can't separately estimate the ancestral state at the root node and the position of the phenotypic optimum. That means that we have to be pretty comfortable with assuming that the ancestor of our group was at the trait optimum, which may or may not have been the case depending on our data and tree.

That's it.

1 comment:

  1. Hi Liam,
    I'd like to estimate my ancestral states, which evolve under OU model.
    But before using the tool you presented in this post, I need to estimate the "alpha" parameter.
    But fitContinuous (fit<-fitContinuous(tr,dat,model="OU")) and anc.ML (fit<-anc.ML(tr,dat,model="OU")) give widely different values.

    fitContinuous :
    fitted ‘OU’ model parameters:
    alpha = 0.467411
    sigsq = 974.957102
    z0 = 196.516937
    log-likelihood = -366.988318
    anc.ML
    Fitted model parameters & likelihood:
    sigma^2 alpha logLik
    76.2096 0.022422 -642.7123

    Which one should I use ? Why such a difference ?

    Tks for this blog,
    Pierre

    ReplyDelete