Monday, September 17, 2012

Update to phyl.pca so that "mode" (correlation vs. covariance) can be specified flexibly

A user recently complained the following about phytools function phyl.pca for phylogenetic principal components:

One little suggestion for protecting users from themselves might to throw up an error if the user misspells the mode parameters [Ed. note: which specifies whether to use the covariance or correlation matrix during analysis]. Currently, if you type:
   mode="cor"
It will work, and just run on the default covariance matrix. But "cor" is a misspelling of "corr", which will yield the desired analysis. It might be a common error, since cov and cor are both three letters. I just did this, but noticed the output was identical.


Good suggestion. I have now added a few lines of code to phyl.pca to fix this problem. Basically, the function now accepts any unambiguous shorting of covariance and correlation instead of just the three and four letter strings "cov" and "corr" respectively. In addition, if the match is imperfect, the function will return a message.

The updated function code is here; it is also in the latest non-CRAN version of phytools (v0.1-94, here) and will be in the next CRAN phytools release.

The way I achieved this is really simple. Basically, I used the base function strsplit to split the input mode into characters; then I matched the specified mode characters to strsplit("correlation",split="") or strsplit("covariance",split=""). Finally, if the match was imperfect (for instance, mode="corelation"), the function defaults to using the covariance matrix, but also spits an error.

Here's a demo:

> # first install new phytools
> install.packages("phytools_0.1-94.tar.gz",type="source", repos=NULL)
Installing package(s) into ‘C:/Users/liam.revell/Documents/R/win-library/2.14’ (as ‘lib’ is unspecified)
* installing *source* package 'phytools' ...
> # load phytools
> library(phytools)
> # for demo, simulate tree & data
> tree<-pbtree(n=40,scale=100)
> L<-matrix(rnorm(n=16),4,4); L[upper.tri(L,diag=F)]<-0
> V<-L%*%t(L) # covariance matrix for simulation
> X<-sim.corrs(tree,V)
> X
          [,1]        [,2]         [,3]       [,4]
t1   10.0145660  25.5508968  27.14419359 -16.670596
t2   -0.9504677  -1.1381218  23.31553654 -16.947086
t3         ....
> pca<-phyl.pca(tree,X,mode="covar")
> pca<-phyl.pca(tree,X,mode="co")
mode = 'co' not a valid option; setting mode = 'cov'
> pcaCorr<-phyl.pca(tree,X,mode="corelation")
mode = 'corelation' not a valid option; setting mode = 'cov'
> pcaCorr<-phyl.pca(tree,X,mode="correlation")
> pca
$Eval
        PC1      PC2      PC3          PC4
PC1 5.612471 0.000000 0.000000 0.000000e+00
PC2     ....
$Evec
           PC1        PC2       PC3         PC4
[1,] -0.2026469  0.2849556 0.1557432  0.92383906
[2,] -0.5305595  0.6617452 0.3666632 -0.38230631
[3,] -0.4710700 -0.6810988 0.5603986  0.01227893
[4,]  0.6749325  0.1303764 0.7261237 -0.01457757
$S           PC1        PC2         PC3          PC4
t1  -24.731226  -0.357717  13.5570211  0.038474777
t2        ....

> pcaCorr
$Eval
        PC1      PC2       PC3          PC4
PC1 2.363718 0.000000 0.0000000 0.000000e+00
PC2     ....
$Evec
           PC1        PC2       PC3         PC4
[1,]  0.5945738  0.3526378 0.1810304  0.69954028
[2,]  0.6090435  0.3035288 0.1661745 -0.71366821
[3,]  0.2450079 -0.7643208 0.5960410  0.02280308
[4,] -0.4642405  0.4464644 0.7644274 -0.02830375
$S
          PC1         PC2         PC3          PC4
t1   16.705476  -3.6663012   7.3051355  0.040978169
t2        ....


That's it. Thanks to the phytools user for his comment. Keep them coming!

2 comments:

  1. Hi Liam,
    I try to run the phyl.pca function, but I keep on getting this error message:

    "Error in solve.default(kronRC, y - D %*% a) :
    system is computationally singular: reciprocal condition number = 3.02577e-028"

    Can't figure out what is wrong with the data. Do you have any idea? It does run the demo, so I assume my input files are somehow wrong. I have a tree in nexus format, and a trait file input in csv format (no missing data), with names linked to the traits (data <- traits[,2:7]
    rownames(data) <- traits$name).
    Hope you can help?

    Renske


    ReplyDelete
    Replies
    1. Hi Renske.

      Does your try have terminal branch lengths that are zero? That's one thing that will cause singularity of vcv.phylo(tree), which means the phylogenetic PCA cannot be computed. If not that, maybe send me data & tree and I would be happy to investigate further.

      - Liam

      Delete