Phylogenetic Tools for Comparative Biology: Combining results from a phylogenetic regression on multiple trees in a posterior sample

Tuesday, April 2, 2013

Combining results from a phylogenetic regression on multiple trees in a posterior sample

A recent query on the R-sig-phylo email list asked:

"Imagine that I run a PIC analysis on two traits using 1000 post-burn-in trees. What would be the best way to summarize these results? Average p-values across all analyses? Perhaps a specific method to combine the resulting probabilities e.g. Fisher's test?"

A little bit of discussion ensued, but my suggestion read as follows:

...we should probably (begin by) combin(ing) the variance among estimates obtained from different trees in the posterior sample with the sampling variance of any single estimate... (to get the total variance of our parameter estimates for hypothesis testing)

One fairly sensible way to do this is to compute the variance due to phylogenetic uncertainty as the variance among estimates obtained from the trees of the posterior sample; and then to compute the sampling variance as the mean variance of the estimator from each tree; and then add the two variances them. The standard error of our estimate (computed as the mean across trees) is the square-root of this variance. To conduct a hypothesis test on the regression coefficient, then, you would compute the mean across trees and then the ratio of the parameter and its standard error should have a t-distribution with n-2 degrees of freedom for n taxa (not contrasts).

To do this from a practical perspective from trees in 'multiPhylo' object (here "trees") and data in x & y for a simple bivariate regression, we do the following:

# first define the following custom function
ff<-function(tree,x,y){
   pic.x<-pic(x,tree)
   pic.y<-pic(y,tree)
   fit<-lm(pic.y~pic.x-1)
   setNames(c(coef(fit),vcov(fit)),c("beta","var(beta)"))
}
# now apply to all trees in your sample
BB<-t(sapply(trees,ff,x,y))
# total variance in beta estimated by
varBeta<-var(BB[,"beta"])+mean(BB[,"var(beta)"])
t.beta<-mean(BB[,"beta"])/sqrt(varBeta)
P.beta<-2*pt(abs(t.beta),df=length(trees[[1]]$tip)-2, lower.tail=FALSE)

I think that's right.

Now, someone wisely pointed out that this approach is somewhat inelegant in combining Bayesian, maximum likelihood, & frequentist hypothesis testing. I can't argue with this and would agree that a better & more elegant solution would be to simultaneously sample trees with branch lengths & the parameters of our evolutionary model for trait evolution from their joint posterior probability distribution. Presently, this is impractical.

4 comments:

Rafael MaiaMay 31, 2013 at 1:52 PM
hi Liam,

I was wondering if something similar could be done for brownie.lite/OUwie types of models - not for p-values, but to combine estimation & phylogenetic errors and get meaningful confidence intervals for the parameters in the model (say if you have more than 2 regimes and you want to get an idea of if only 1 is different than 2 & 3 and so on).

Thanks!
ReplyDelete
Replies

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.

Pages

Tuesday, April 2, 2013

Combining results from a phylogenetic regression on multiple trees in a posterior sample

4 comments: