Monday, October 29, 2012

Multiple matrix regression with Mantel permutations

An R user asks:
"whether there is in the world a script (function) in R for multi_mantel: multiple matrix regression and hypothesis testing"

The program for multiple matrix regression that the user is referring to is my old C program, "multi_mantel", that I have not used in quite some time (and try my hardest not to support).

When I programmed this years ago (i.e., circa 2006) it was a considerable undertaking. In R, it should be a much easier task. This is because we can take advantage of existing 'base' and 'stats' package functions such as lm and summary.lm (not to mention all the other many conveniences that are afforded by the R environment).

To prove this (on a rainy, windy day here in Boston) I just did so. A link to the function code is here.

The way this works is really straightforward. The function first unfolds our input matrices (which can also be objects of class "dist", and then performs multivariable regression using lm. (Note that if we have multiple independent matrices, we should give them in a list.) Next, it randomizes rows & columns together in the dependent matrix, following Mantel (1967), each time recomputing the test statistics (in this case, F and the t-statistics for each coefficient in the model.) Finally, it compares the observed F & t-values to the null distributions obtained by permutation.

The function internally calls two other custom functions (unfoldLower & foldtoLower) that convert a symmetric distance matrix to a vector; or, conversely, that converts a vector to a symmetric distance matrix.

Here's an example use. Note that I don't endorse the Mantel test for estimating phylogenetic signal. According to Harmon & Glor (2010) this is generally a bad idea.

> source("multi.mantel.R")
> tree<-pbtree(n=100)
> y<-fastBM(tree,sig2=2)
> Y<-dist(y)
> X<-cophenetic(tree)
> fit<-multi.mantel(Y^2,X,nperm=1000)
> fit[1:6]
$r.squared
[1] 0.0341906
$coefficients
(intercept)          X1
  1.788184    2.207660
$tstatistic
(intercept)          X1
  1.600743   13.234956
$fstatistic
[1] 175.1641
$probt
(intercept)          X1
     1.000       0.001
$probF
[1] 0.001

Note that the regression slope (fit$coefficients["X1"] is also an estimate of the evolutionary rate in this case - although it is not a very good one. Compare the fit & P-value here to the case in which we have data with no phylogenetic signal:

> require(geiger)
> y<-fastBM(lambdaTree(tree,0))
> Y<-dist(y)
> fit<-multi.mantel(Y^2,X,nperm=1000)
> fit[1:6]
$r.squared
[1] 0.0002468181
$coefficients
(intercept)          X1
6.50077758  0.09146137
$tstatistic
(intercept)          X1
 11.730152    1.105241
$fstatistic
[1] 1.221558
$probt
(intercept)          X1
     0.666       0.680
$probF
[1] 0.68

Again, I wouldn't necessarily advocate measuring phylogenetic signal this way (if at all), but this example still serves to demonstrate the function nicely.

No comments:

Post a Comment