Thursday, September 8, 2011

Lambda optimization in phyl.cca()

I just added joint λ optimization to the phylogenetic canonical correlation analysis function phyl.cca(). Direct link to code is here. This nearly matches up its functionality to my "pcca" program (here & publication here). The only thing still missing is the calculation of standardized canonical coefficients and canonical coefficients. The former are analogous to standardized regression coefficients; whereas the latter are correlations (in this case, phylogenetic correlations) between the canonical variables and the original measured traits. I will hopefully add these to a future version of the function.

Just to give a quick sense of λ to the uninitiated, λ is a scaling parameter for the off-diagonals of the "coancestry matrix" (here defined as N × N matrix containing the times above the root for each pair of N species) and the among species phenotypic correlation matrix. λ and the joint estimation of λ for multiple traits is best described in a paper by Rob Freckleton & colleages (2002). It provides a very simple way to relax the strict Brownian motion assumption of most phylogenetic analyses of this nature.

To illustrate use of this function, I will give the following example of simulating with λ = 0.75 and no correlations between the Xs & the Ys. To do this, you need to install & load "phytools," "geiger," & (naturally) their dependencies.

First, let's simulate a tree & data:

> require(phytools) # load phytools
> require(geiger) # load geiger
> source("") # load source
> tree<-drop.tip(birthdeath.tree(b=1,d=0,taxa.stop=101),"101")
> X<-fastBM(lambdaTree(tree,lambda=0.75),nsim=4)
> Y<-fastBM(lambdaTree(tree,lambda=0.75),nsim=3)

Here, we've simulated a random Yule tree with 100 taxa along with two sets of uncorrelated random variables.

Now, let's run phylogenetic CCA with joint estimation of λ:

> result<-phyl.cca(tree,X,Y,fixed=F)

Here's our estimate of λ:

> result$lambda
[1] 0.7524329

Here's our canonical correlations & corresponding p-values:

> result$cor
[1] 0.37263858 0.17331011 0.04891746
> result$p
[1] 0.1377030 0.7930295 0.8924366

Now, let's try the analysis with fixed lambda of λ=1.0 (the default):

> result<-phyl.cca(tree,X,Y)
> result$cor
[1] 0.5571312 0.2352854 0.1519402
> result$p
[1] 2.316245e-05 2.665493e-01 3.297459e-01

And, now without the phylogeny at all (i.e., with a star tree):

> result<-phyl.cca(starTree(tree$tip.label,rep(1,100)),X,Y)
> result$cor
[1] 0.5505426 0.2995164 0.1657136
> result$p
[1] 7.276668e-06 7.216184e-02 2.664303e-01

Although this is just a single example, it gives us a little cause for consternation because both the fixed λ and non-phylogenetic analyses show significant canonical correlations, even though we didn't simulate any!

Please let me know if you give the function a try.


  1. Dear Liam,
    I am trying your phyl.cca function but I keep getting the following error:
    Error in ccs > 0 : invalid comparison with complex values
    is this a format problem with the data matrices or something else is going on?
    thanks for your reply and for developing this awesome 'phytools'

  2. Hello Liam, just tried it with real data and got:

    nonfixed lambda:
    > pCCA$cor
    [1] 0.8592734 0.7229225 0.4918054
    > pCCA$p
    [1] 0.009319754 0.277941825 0.732731644

    fixed lambda (default)
    > pCCA$cor
    [1] 0.8487658 0.7821672 0.4815591
    > pCCA$p
    [1] 0.004401122 0.112495274 0.762967808
    star phylogeny:
    [1] 0.8794160 0.7020003 0.4563339

    > pCCA$p
    [1] 0.007407126 0.412586504 0.828730628

    Does it give you any light?
    My challenge now is to plot these results in a meaningful way.


    1. Hi Agus.

      Looks like you have close agreement between phylogenetic & non-phylogenetic analyses which is that you have one significant canonical correlation. I don't have any specific advice regarding plotting, but often people interpret CCA results using things such as the canonical loadings.

      Good luck. Liam