Phylogenetic Tools for Comparative Biology: Phylogenetic size correction using GLS

Friday, April 1, 2011

Phylogenetic size correction using GLS

I just posted a new function to my R-phylogenetics page called phyl.resid() (direct link to code here). This function takes as input a vector or matrix with one or multiple morphological traits (Y), a vector or matrix with one column containing size (x), and a tree; and performs phylogenetic size correction following Revell (2009).

The function can fit the model using two different error structures. It can use the error structure implied by simple BM; but it can also use a "lambda" error structure based on Pagel (1999).

Here, I show how to generate simulated data, and then fit the model and calculate residuals using my function. First, using standard BM:

> # first load the function from source
> source("phyl.resid.R")
> # load {geiger} for simulation
> require(geiger)
> # simulate a stochastic b-d tree
> tree<-drop.tip(birthdeath.tree(b=1,d=0,taxa.stop=101),"101")
> # generate values for size (x) on the tree
> size<-sim.char(tree,as.matrix(1))[,,1]
> # generate residual error on the tree
> e<-sim.char(tree,as.matrix(diag(0.1,10)))[,,1]
> # generate data for morphology (Y)
> morph<-0.75*size+e+1.0 # arbitrary slope & intercept
> # now compute phylogenetic regressions & residuals
> res.bm<-phyl.resid(tree,x=size,Y=morph)
> res.bm
$beta
[,1] [,2] [,3] [,4] [,5] [,6]
1.0505289 0.7406708 0.9833486 0.6723376 1.1780240 1.1162929
x 0.7686907 0.6957405 0.7890109 0.7772312 0.7821898 0.7752434
[,7] [,8] [,9] [,10]
0.5867387 1.1465067 1.2188155 0.9111480
x 0.7466242 0.8108946 0.6988025 0.6798143
$resid
[,1] [,2] [,3] [,4]
35 -0.57578507 -0.2825177189 -0.955908341 -0.353829729
53 0.13403231 -0.4167233466 -1.244141868 -0.360423877
54 -0.28546926 -0.2337143482 -1.060995881 -0.106119966
45 0.20879004 -0.1023593678 -1.016254850 -0.044159533
...
> # now compute the correlations between e & res.bm$resid
> diag(cor(e,res.bm$resid))
[1] 0.9975066 0.9842922 0.9955430 0.9969516 0.9968132 0.9963083
[7] 0.9999522 0.9778480 0.9911174 0.9643098

So, in this case we see that our estimated residuals (res.bm$resid) are highly correlated with our generating residuals (e).

Let's try this again, but with a different randomly generated value of lambda for each column of e. We can use the same tree & x as before:

> # generate residual error on the tree using lambdaTree
> e<-matrix(NA,100,10,dimnames=list(names(size)))
> # generate random lambda
> lambda<-runif(10)
> # simulate residual error
> for(i in 1:10)
+ e[,i]<-sim.char(lambdaTree(tree,lambda=lambda[i]), as.matrix(0.1))[,,1]
> # generate data for morphology (Y)
> morph<-0.75*size+e+1.0 # arbitrary slope & intercept
> # now compute phylogenetic regressions & residuals
> res.lambda<-phyl.resid(tree,x=size,Y=morph,method="lambda")
> res.lambda
$beta
[,1] [,2] [,3] [,4] [,5] [,6]
1.1946700 1.1669820 0.9899934 0.6516128 0.8610812 1.1297510
x 0.7843517 0.8268043 0.6847976 0.7268403 0.7880103 0.7544444
[,7] [,8] [,9] [,10]
0.9381019 1.1502370 0.7875310 0.9959001
x 0.7389826 0.7931595 0.6583763 0.7477538

$lambda
[1] 8.435166e-01 6.610696e-05 7.463530e-01 9.672567e-01
[5] 7.654335e-01 8.839716e-01 2.376671e-01 6.870145e-01
[9] 1.653223e-01 6.610696e-05

$logL
[1] -61.32796 -88.37910 -72.52449 -40.10453 -58.06409 -56.09014
[7] -80.70959 -84.00074 -92.85250 -94.28537

$resid
[,1] [,2] [,3] [,4]
35 0.10484806 -0.302784537 -1.07117426 -0.21346466
53 -0.56102358 -0.647005948 -0.51969545 -0.01657716
54 -0.02236203 -0.294930056 -0.50011842 -0.13512110
...
> # compute the correlation between lambda & res.lambda$lambda
> cor(lambda,res.lambda$lambda)
[1] 0.9417568
> # now compute the correlations between e & res.lambda$resid
> diag(cor(e,res.lambda$resid))
[1] 0.9944585 0.9705136 0.9816635 0.9967442 0.9923672 0.9998890
[7] 0.9993074 0.9947153 0.9643409 0.9999766

From this example we can see that our generating and estimated lambda values are highly correlated (remember, there are only 10 of these for the 10 replicate simulations); and our generating and estimated residuals are also very highly correlated as before.

In theory, you should be able to do all of this using gls() in {nlme}; however for some reason I have not been able to always get this to return the correct result (as assessed by comparison to PIC regression) or sensible estimates of lambda. Maybe some of my readers have had similar experiences?

11 comments:

dwbapstApril 3, 2011 at 12:57 PM
Liam,
Does your function account for the differences varinces produced by the differences in tip height in non-ultrametric paleo-trees? (As we've discussed on the list.)

Tangentially, since your comment in Evolution on this topic came out, I've been wondering about the application of phylogenetic size-correction and PCA. How problematic is the issue for paleontologists, who generally lack reliable phylogenies but commonly apply PCA and size-correction to their data?
-Dave
ReplyDelete
Replies
Liam RevellApril 3, 2011 at 1:24 PM
Hi Dave.

Yes, no problem to have a non-ultrametric tree. One should be able to do this using gls(...,weights) as well.

Regarding, your second item - this would seem to be fundamentally an empirical question. However, in my opinion I think it will depend on the purpose of the PCA. For instance, if you ignore phylogeny then the PCs will not be evolutionarily orthogonal; however if the purpose is merely data reduction, this might not be of great consequence.

Thanks for the great comment.

- Liam
ReplyDelete
Replies
dwbapstApril 3, 2011 at 1:46 PM
Liam-
Hmm. The most common uses for PCA are for measuring disparity or visualizing morphospace occupation. Might be a good modeling project for a junior student to do: see what biases are introduced into disparity metrics by using traditional PCA.
-Dave
ReplyDelete
Replies
Liam RevellApril 12, 2011 at 2:37 PM
Dave,
I have also posted a new function for phylogenetic PCA. I describe it here and then (with the inevitable bug fixes) here.
ReplyDelete
Replies
AnonymousFebruary 13, 2013 at 6:38 PM
Hello Liam,

Thanks for the great package (phytools) & functions! I had a couple of questions that I hope are not too naive.

1. I have been using different methods to generate residuals from a PGLS (as you describe in your post: http://blog.phytools.org/2012/11/fitting-model-in-phylogenetic.html) I have run your code as described below:

METHOD A
# using phytools
library(phytools)
# assuming your data are in named vectors x & y
fit.A<-phyl.resid(tree,x,y,method="lambda")
fit.A

METHOD B
# using caper
library(caper)
# assuming your data are in named vectors x & y
y<-y[names(x)]
X<-data.frame(x,y,Species=names(x))
fit.B<-pgls(y~x,comparative.data(tree,X,"Species"), lambda="ML")
residuals(fit.B)

and have found that "residuals(fit.A)" are the same as the values in "fit.B". However,

residuals(fit.B, phylo=T)

provides a different answer and is explained as the "phylogenetic residuals from the PGLS model." I had assumed (probably due to a lack of knowledge on the subject) that the results of phyl.resid would be the "phylogenetic residuals" (i.e. the residuals from a phylogenetically informed GLS model).

I would like to use the resulting residuals of the PGLS/phyl.resid in further phylogenetically informed analyses as the size corrected value for my trait of interest (keeping in mind the advice you give in your 2009 Evolution paper). It strikes me though that the "phylogenetic residuals" are the more appropriate metric to be using if one is after a phylogenetically size corrected trait. Might you have any insight into the differences between the result of "phyl.resid" residuals and the "phylogenetic residuals" from a PGLS model?

2. Could you say a few words on the methods you use to ensure your analyses fit the assumptions of phyl.resid (keeping in mind the cautions you give in your 2010 "Phylogenetic signal and linear regression on species data" paper and ensuring that phylogenetic analysis is appropriate in the first place).

For example, Charlie Nunn and Natalie Cooper suggest removing species that have a studentized residual greater than 3 or less than -3 (http://nunn.rc.fas.harvard.edu/groups/pica/wiki/d8009/751_Running_PGLS_in_R_using_caper.html). Do you think this is equivalently appropriate for the studentized residuals of the phyl.resid residuals to ensure that outliers are removed? Other thoughts on techniques that must be used to assure the data meet the appropriate assumptions?

Many thanks for any advice you might be able to provide and apologies for anything that seems naive.
ReplyDelete
Replies
UnknownSeptember 10, 2014 at 8:37 AM
First, thanks a lot for this new function and all the explanations you give on this blog.

I'm a student in evolutionary biology and i'm studying evolution of acoustic communication in a cricket genus.
I'm working on acoustic, morphological, and acoustic data and i would like to know if there is some corelations between those characters.
So, i had to partial out the effect of body size on these characters prior to conduct the analysis. I obtained the residuals size corrected of all my characters.
Now, i would like to know correlations between those variables, but i don't know if i have to do "traditional regressions" or pGLS regression on the residuals.
I would say, i have to do pGLS regression on the residuals in order to take into account phylogenetic effect, but i'm not sure and i would like to have your point of view.

Many thanks for any advice you might be able to provide and apologies for anything that seems naive.

Augustin
ReplyDelete
Replies
ReginaMay 12, 2015 at 5:01 PM
Dear Liam

I am new about phylogenetical comparative methods and apologize me if this question was made previously. I am trying to do a phylogenetic size-correction (with phyl.resid), then with the residuals obtained perform a phylogenetically PCA (with phyl.pca), and finally with the scores obtained make a manova ( with aov.phylo). I made this with the averages of morphological traits, but I am interested in take account intraspecific variation within a species.
As a test I used individual measures and a tree with members of the same species as a polytomy (e.g. " ((bufonius1,bufonius2,bufonius3,bufonius4,bufonius5,bufonius6,bufonius7,bufonius8,bufonius9,bufonius10,(mystacinus1,mystacinus2,mystacinus3,mystacinus4,mystacinus5,mystacinus6,mystacinus7,mystacinus8,mystacinus9,mystacinus10,troglodytes1,troglodytes2,troglodytes3,troglodytes4,troglodytes5,troglodytes6,troglodytes7,troglodytes8,troglodytes9,troglodytes10))" with branch lengths=1 and grafen's branch length ( I don't have branch lengthfor my tree), apparently it works but I dont know if this is a correct form to obtain the residuals, scores etc.
thanks a lot in advance
best regards
Regina
ReplyDelete
Replies
UnknownJuly 15, 2016 at 12:10 PM
Dear Liam,

I'm wondering whether it is acceptable to log-transform your variables before using this correction?

Thanks
ReplyDelete
Replies
UnknownJanuary 2, 2022 at 8:45 AM
pgls {caper} returns an object of class pgls containing phyres
= the phylogenetic residuals. I was wondering whether this is identical to what is returned by your phyl.resid function?
ReplyDelete
Replies

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.

Pages

Friday, April 1, 2011

Phylogenetic size correction using GLS

11 comments: