## Wednesday, November 16, 2011

### Including "measurement error" in the estimation of λ

I just posted a new version of the function phylosig() which now allows the user to incorporate measurement error in the estimation of Pagel's λ. Pagel's λ is usually estimated using maximum likelihood, so it was straightforward to condition the estimation of λ & σ2 (the rate of evolution) on an observed set of standard errors. The code for this new version of the function is here. Unfortunately, as I have added capabilities to this function, I have done so in a haphazard fashion, which has resulted in code for this function that has become a little bit more expansive than necessary (compare, for instance, to the code for my new REML version of brownie.lite, here). At some point I will try and clean this up.

To add measurement error to the estimation of λ is not too difficult. Basically, under the λ model we expect that the multivariate distribution of phenotypic values among species will be given by σ2Cλ; where Cλ is an n×n matrix (for n species) containing, in the diagonal, the height of each species above the root; and, in each off-diagonal element Cλ(i,j), the height above the root node of the MRCA of species i and species j multiplied by the coefficient λ. In other words: x ~ MVN(σ2Cλ). Given a set of values for x and a tree, we can estimate λ using likelihood.

To incorporate measurement error (that is, estimation error for the species means for the trait of interest), we just have x ~ MVN(σ2Cλ + E), where E is a matrix containing the error variance and covariances among species. If we assume that estimation error is uncorrelated among species, then E is just a diagonal matrix containing the square of the estimation error for each species. (The assumption that measurement error is independent among species is reasonably safe - but it might be false if, for instance, samples were measured by different investigators or in different years; e.g., see Ives et al. 2007).

The way to run phylosig() with measurement error is simply by supplying the additional vector, se, containing the standard errors for each species. names(se) should contain the species names (corresponding to tree\$tip.label, although they need not be in the same order).

To see how the function works, let's try the following example:

> require(phytools); require(geiger)
> set.seed(2)
> gen.lambda<-0.7
> tree<-drop.tip(birthdeath.tree(b=1,d=0,taxa.stop=501),"501")
> se<-sqrt(rchisq(n=500,df=2)); names(se)<-tree\$tip.label
> x<-fastBM(lambdaTree(tree,gen.lambda),sig2=1)
> e<-rnorm(n=500,sd=se)
> xe<-x+e
> phylosig(tree,x,method="lambda")
\$lambda
[1] 0.7141959
\$logL
[1] -1018.6
> phylosig(tree,xe,method="lambda")
\$lambda
[1] 0.5431885
\$logL
[1] -1153.855
> phylosig(tree,xe,method="lambda",se=se)
\$lambda
[1] 0.7274975
\$sig2
[1] 1.161162
\$logL
[1] -1135.698