Tuesday, November 28, 2017

Significant update to phyl.pairedttest for phylogenetic paired t-test in R

A good long time ago now some co-authors & I (Lindenfors et al. 2010) published a 'phylogenetic paired t-test.' This is a really simple method in which we ask if a set of paired traits (say, male and female RBC counts; or summer & winter body mass) differ significantly one from the other across species, while taking into account non-independence due to the phylogeny. The method can also account for (known) uncertainty in either or both trait vectors, and can relax the assumption of pure Brownian motion for the correlation structure of the errors (using Pagel's λ model).

The method has been implemented in the function phyl.pairedttest; however a phytools user recently pointed out some issues with optimizing the model (which uses likelihood). This turned out to be due to trait scale. For instance, for (numerically) large mean differences between the two trait vectors, the corresponding element of the Hessian matrix from the likelihood surface was sometimes numerically indistinguishable from zero.

I fixed this by using an internal rescaling, which also then required that I back-transform the estimated parameter by (some function of) the scaling factor. I was also able to make other improvements and I added an S3 print method for the object class. All of my updates can be seen here.

The following is a simple demo using simulated data:

library(phytools)
packageVersion("phytools")
## [1] '0.6.46'
phylo.heatmap(tree,cbind(x1,x2,x3),fsize=c(0.5,0.9,0.9),
    standardize=TRUE)

plot of chunk unnamed-chunk-1

phyl.pairedttest(tree,x1,x2)
## 
## Phylogenetic paired t-test:
## 
##    t = 0.52022, df = 57, p-value = 0.604925
## 
## alternative hypothesis:
##    true difference in means is not equal to 0
## 
## 95 percent confidence interval on the phylogenetic
## difference in mean:
##    [-0.999187, 1.72124]
## 
## estimates:
##    phylogenetic mean difference = 0.361026 
##    sig^2 of BM model = 0.879318 
##    lambda (fixed or estimated) = 1 
## 
## log-likelihood:
##     -78.3204
phyl.pairedttest(tree,x1,x3)
## 
## Phylogenetic paired t-test:
## 
##    t = -2.44073, df = 57, p-value = 0.0177879
## 
## alternative hypothesis:
##    true difference in means is not equal to 0
## 
## 95 percent confidence interval on the phylogenetic
## difference in mean:
##    [-2.95937, -0.323278]
## 
## estimates:
##    phylogenetic mean difference = -1.64132 
##    sig^2 of BM model = 0.826258 
##    lambda (fixed or estimated) = 0.998969 
## 
## log-likelihood:
##     -77.2733

Note that the method should yield (nearly approximately) the same values as a standard paired t-test, if we fix λ to zero. The only difference is that we compute the standard error of the mean difference from the Hessian matrix instead of analytically:

phyl.pairedttest(tree,x1,x2,lambda=0,fixed=TRUE)
## 
## Phylogenetic paired t-test:
## 
##    t = 1.33211, df = 58, p-value = 0.188035
## 
## alternative hypothesis:
##    true difference in means is not equal to 0
## 
## 95 percent confidence interval on the phylogenetic
## difference in mean:
##    [-0.129139, 0.677096]
## 
## estimates:
##    phylogenetic mean difference = 0.273978 
##    sig^2 of BM model = 0.720126 
##    lambda (fixed or estimated) = 0 
## 
## log-likelihood:
##     -113.078
t.test(x1,x2,paired=TRUE,var.equal=TRUE)
## 
##  Paired t-test
## 
## data:  x1 and x2
## t = 1.321, df = 59, p-value = 0.1916
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1410439  0.6890009
## sample estimates:
## mean of the differences 
##               0.2739785

In this example, the data were simulated with a genuine difference between x1 and x3, but not between x1 and x2 as follows:

tree<-pbtree(n=60)
x1<-fastBM(tree)
x2<-x1+fastBM(tree)
x3<-x1+fastBM(tree,a=2)

That's it.

No comments:

Post a Comment