Tuesday, November 28, 2017

Significant update to phyl.pairedttest for phylogenetic paired t-test in R

A good long time ago now some co-authors & I (Lindenfors et al. 2010) published a 'phylogenetic paired t-test.' This is a really simple method in which we ask if a set of paired traits (say, male and female RBC counts; or summer & winter body mass) differ significantly one from the other across species, while taking into account non-independence due to the phylogeny. The method can also account for (known) uncertainty in either or both trait vectors, and can relax the assumption of pure Brownian motion for the correlation structure of the errors (using Pagel's λ model).

The method has been implemented in the function phyl.pairedttest; however a phytools user recently pointed out some issues with optimizing the model (which uses likelihood). This turned out to be due to trait scale. For instance, for (numerically) large mean differences between the two trait vectors, the corresponding element of the Hessian matrix from the likelihood surface was sometimes numerically indistinguishable from zero.

I fixed this by using an internal rescaling, which also then required that I back-transform the estimated parameter by (some function of) the scaling factor. I was also able to make other improvements and I added an S3 print method for the object class. All of my updates can be seen here.

The following is a simple demo using simulated data:

library(phytools)
packageVersion("phytools")
## [1] '0.6.46'
phylo.heatmap(tree,cbind(x1,x2,x3),fsize=c(0.5,0.9,0.9),
    standardize=TRUE)

plot of chunk unnamed-chunk-1

phyl.pairedttest(tree,x1,x2)
## 
## Phylogenetic paired t-test:
## 
##    t = 0.52022, df = 57, p-value = 0.604925
## 
## alternative hypothesis:
##    true difference in means is not equal to 0
## 
## 95 percent confidence interval on the phylogenetic
## difference in mean:
##    [-0.999187, 1.72124]
## 
## estimates:
##    phylogenetic mean difference = 0.361026 
##    sig^2 of BM model = 0.879318 
##    lambda (fixed or estimated) = 1 
## 
## log-likelihood:
##     -78.3204
phyl.pairedttest(tree,x1,x3)
## 
## Phylogenetic paired t-test:
## 
##    t = -2.44073, df = 57, p-value = 0.0177879
## 
## alternative hypothesis:
##    true difference in means is not equal to 0
## 
## 95 percent confidence interval on the phylogenetic
## difference in mean:
##    [-2.95937, -0.323278]
## 
## estimates:
##    phylogenetic mean difference = -1.64132 
##    sig^2 of BM model = 0.826258 
##    lambda (fixed or estimated) = 0.998969 
## 
## log-likelihood:
##     -77.2733

Note that the method should yield (nearly approximately) the same values as a standard paired t-test, if we fix λ to zero. The only difference is that we compute the standard error of the mean difference from the Hessian matrix instead of analytically:

phyl.pairedttest(tree,x1,x2,lambda=0,fixed=TRUE)
## 
## Phylogenetic paired t-test:
## 
##    t = 1.33211, df = 58, p-value = 0.188035
## 
## alternative hypothesis:
##    true difference in means is not equal to 0
## 
## 95 percent confidence interval on the phylogenetic
## difference in mean:
##    [-0.129139, 0.677096]
## 
## estimates:
##    phylogenetic mean difference = 0.273978 
##    sig^2 of BM model = 0.720126 
##    lambda (fixed or estimated) = 0 
## 
## log-likelihood:
##     -113.078
t.test(x1,x2,paired=TRUE,var.equal=TRUE)
## 
##  Paired t-test
## 
## data:  x1 and x2
## t = 1.321, df = 59, p-value = 0.1916
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1410439  0.6890009
## sample estimates:
## mean of the differences 
##               0.2739785

In this example, the data were simulated with a genuine difference between x1 and x3, but not between x1 and x2 as follows:

tree<-pbtree(n=60)
x1<-fastBM(tree)
x2<-x1+fastBM(tree)
x3<-x1+fastBM(tree,a=2)

That's it.

1 comment:

  1. Dear Liam,
    I had posted earlier about a question about the phylogenetic paired t-test, and with your help I got it sorted (I think, in the end it was also partly a problem with incompatibility of different versions of R and the package, which I could not resolve working on an office computer earlier). Now everything is running fine, but I am a bit surprised by the outcome, fearing that I make some mistake in my settings. I am looking at body masses of male primates (27 species) from captive or wild populations. When I calculate the lambda, I get a very large lambda, which is highly significantly different from 0.
    > (signal_C<-phylosig(baum, captive, method="lambda", test=TRUE))

    Phylogenetic signal lambda : 0.999934
    logL(lambda) : -14.3279
    LR(lambda=0) : 40.3972
    P-value (based on LR test) : 2.07242e-10

    When I then run the paired test using the "Fixed=FALSE" option, I still get a test-outcome that seems to suggest that lambda is taken as zero (yielding overall significant differences between captive and wild body masses, which in itself is not surprising).
    > lambda<-signal_C$lambda
    >
    > phyl.pairedttest(tree=baum,wild,captive, lambda=lambda, fixed=FALSE)

    Phylogenetic paired t-test:

    t = -4.10251, df = 24, p-value = 0.000406537

    alternative hypothesis:
    true difference in means is not equal to 0

    95 percent confidence interval on the phylogenetic
    difference in mean:
    [-0.086297, -0.0304976]

    estimates:
    phylogenetic mean difference = -0.0583973
    sig^2 of BM model = 6.27146e-05
    lambda (fixed or estimated) = 0

    log-likelihood:
    32.0024

    In contrast, if I use the Fixed=TRUE option, and lambda is forced to be 0.99, then the difference between the means is not deemed to be statistically different.
    > phyl.pairedttest(tree=baum,wild,captive, lambda=lambda, fixed=TRUE)

    Phylogenetic paired t-test:

    t = -0.395143, df = 25, p-value = 0.696089

    alternative hypothesis:
    true difference in means is not equal to 0

    95 percent confidence interval on the phylogenetic
    difference in mean:
    [-0.302461, 0.200968]

    estimates:
    phylogenetic mean difference = -0.0507465
    sig^2 of BM model = 0.00073346
    lambda (fixed or estimated) = 0.999934

    log-likelihood:
    18.1789

    So now I am concerned about how lambda is estimated when Fixed=FALSE (which seems intuitively the more appropriate option) and that it does not what I think it does, i.e. to use the data to estimate whether phylogeny is important to take into account.
    Sorry for the length of this - but could you tell me whether I could be reasonably confident about my fixed=FALSE approach, when the original lambda estimate was significiantly different from 0?

    ReplyDelete

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.