This will also be part of the focus of my upcoming symposium talk at Evolution this summer, but it is becoming my impression that there is a distinct lack of appreciation for how error in the estimation of species means can affect model selection in comparative biology. Error in the estimation of species means is undoubtedly pervasive. For example, authors commonly measure 1-10 specimens per species, meaning that (if intraspecific variability is modest or high), sampling error can be substantial.

Just to give two quick examples, let's consider the consequences of (ignored) error in the estimation of species means for a single-optimum OU model (Ornstein-Uhlenbeck), and an EB (Early-burst) model.

First, let's simulate tree & data under Brownian evolution:

> require(phytools); require(geiger)

> set.seed(10) # for repeatability, but try multiple seeds

> tree<-pbtree(n=200,scale=1)

> x<-fastBM(tree)

Now, let's add some sampling error using the function

sampleFrom in phytools:

> xe<-sampleFrom(x,0.2,rep(1,200))
We can easily see that

x and

xe are quite closely correlated - about as correlated as we might expect our true & estimated species means to be in any empirical study:

> plot(x,xe)
Next, let's try to fit BM & OU models to our data:

> fitBM<-fitContinuous(tree,xe,model="BM")

Fitting BM model:

> fitBM

$Trait1

$Trait1$lnl

[1] -348.0691

$Trait1$beta

[1] 9.366567

$Trait1$aic

[1] 700.1382

...

> fitOU<-fitContinuous(tree,xe,model="OU")

Fitting OU model:

> fitOU

$Trait1

$Trait1$lnl

[1] -279.5255

$Trait1$beta

[1] 126.3536

$Trait1$alpha

[1] 63.67358

...

$Trait1$aic

[1] 565.051

...

(It can easily be verified that this is not true of the original data vector

x, for which BM is the better model:

> fitContinuous(tree,x,model="BM")$Trait1$aic < fitContinuous(tree,x,model="OU")$Trait1$aic

Fitting BM model:

Fitting OU model:

[1] TRUE ).

We can see already that for a good amount of sampling error in the estimation of species means, our fitted model flips - from BM (the generating model) to OU. We can also test this for EB. Here, I will adjust the bounds on

alpha, to allow either a decrease

*or* an increase in the evolutionary rate through time.

> fitEB<-fitContinuous(tree,xe,model="EB", bounds=list(alpha=c(-100,100)))

Fitting EB model:

> fitEB

$Trait1

$Trait1$lnl

[1] -282.1674

$Trait1$beta

[1] 1e-08

$Trait1$a

[1] 21.63922

$Trait1$aic

[1] 570.3348

...
Here again we see that EB fits much better than the generating model, BM. Of course, it is possible to address this bias by following the method of Ives et al. (

2007), and this is even implemented in

fitContinuous. Let's try again for the EB example:

> fitBMe<-fitContinuous(tree,xe,model="BM", meserr=rep(sqrt(0.2),200))

Fitting BM model:

> fitBMe

$Trait1

$Trait1$lnl

[1] -218.4676

$Trait1$beta

[1] 0.908003

$Trait1$aic

[1] 440.9353

$Trait1$aicc

[1] 440.9965

$Trait1$k

[1] 2

> fitEBe<-fitContinuous(tree,xe,model="EB", bounds=list(alpha=c(-100,100)),meserr=rep(0.2,200))

Fitting EB model:

> fitEBe

$Trait1

$Trait1$lnl

[1] -218.4567

$Trait1$beta

[1] 0.8149614

$Trait1$a

[1] 0.1468192

$Trait1$aic

[1] 442.9134

$Trait1$aicc

[1] 443.0365

$Trait1$k

[1] 3

Cool. This is what we might expect. Finally, it is interesting to consider what might happen if phenotypic data

*actually* arose under EB, but then sampling error in the estimation of species means is (again) ignored.

> x<-fastBM(linearchangeTree(tree,slope=-0.5))

> xe<-sampleFrom(x,0.2,rep(1,200))

> fitContinuous(tree,x,model="EB")

Fitting EB model:

$Trait1

$Trait1$lnl

[1] -69.14438

$Trait1$beta

[1] 0.9732424

$Trait1$a

[1] -0.6107059

$Trait1$aic

[1] 144.2888

$Trait1$aicc

[1] 144.4118

$Trait1$k

[1] 3

> fitContinuous(tree,xe,model="EB")

Fitting EB model:

$Trait1

$Trait1$lnl

[1] -270.8313

$Trait1$beta

[1] 4.326543
$Trait1$a

[1] 0

$Trait1$aic

[1] 547.6626

$Trait1$aicc

[1] 547.7857

$Trait1$k

[1] 3

In this example, a very strong EB pattern vanishes when there is sampling error.

Of course, the reader should keep in mind that all of the above results depend critically on the size of sampling error relative to the variance among species. If this is very small, it can be much more safely ignored. For example:

> xe<-sampleFrom(x,0.001,rep(1,200))

> fitContinuous(tree,xe,model="EB")

Fitting EB model:

$Trait1

$Trait1$lnl

[1] -75.14334

$Trait1$beta

[1] 0.8179839

$Trait1$a

[1] -0.3384102

$Trait1$aic

[1] 156.2867

$Trait1$aicc

[1] 156.4098

$Trait1$k

[1] 3

As I mentioned above, a more detailed treatment of this and other issues is underway for my

SSB symposium talk at

Evolution 2012 in Ottawa.