A user recently complained the following about phytools function phyl.pca for phylogenetic principal components:
One little suggestion for protecting users from themselves might to throw up an error if the user misspells the mode parameters [Ed. note: which specifies whether to use the covariance or correlation matrix during analysis]. Currently, if you type:
mode="cor"
It will work, and just run on the default covariance matrix. But "cor" is a misspelling of "corr", which will yield the desired analysis. It might be a common error, since cov and cor are both three letters. I just did this, but noticed the output was identical.
Good suggestion. I have now added a few lines of code to phyl.pca to fix this problem. Basically, the function now accepts any unambiguous shorting of covariance and correlation instead of just the three and four letter strings "cov" and "corr" respectively. In addition, if the match is imperfect, the function will return a message.
The updated function code is here; it is also in the latest non-CRAN version of phytools (v0.1-94, here) and will be in the next CRAN phytools release.
The way I achieved this is really simple. Basically, I used the base function strsplit to split the input mode into characters; then I matched the specified mode characters to strsplit("correlation",split="") or strsplit("covariance",split=""). Finally, if the match was imperfect (for instance, mode="corelation"), the function defaults to using the covariance matrix, but also spits an error.
Here's a demo:
> # first install new phytools
> install.packages("phytools_0.1-94.tar.gz",type="source", repos=NULL)
Installing package(s) into ‘C:/Users/liam.revell/Documents/R/win-library/2.14’
(as ‘lib’ is unspecified)
* installing *source* package 'phytools' ...
> # load phytools
> library(phytools)
> # for demo, simulate tree & data
> tree<-pbtree(n=40,scale=100)
> L<-matrix(rnorm(n=16),4,4); L[upper.tri(L,diag=F)]<-0
> V<-L%*%t(L) # covariance matrix for simulation
> X<-sim.corrs(tree,V)
> X
[,1] [,2] [,3] [,4]
t1 10.0145660 25.5508968 27.14419359 -16.670596
t2 -0.9504677 -1.1381218 23.31553654 -16.947086
t3 ....
> pca<-phyl.pca(tree,X,mode="covar")
> pca<-phyl.pca(tree,X,mode="co")
mode = 'co' not a valid option; setting mode = 'cov'
> pcaCorr<-phyl.pca(tree,X,mode="corelation")
mode = 'corelation' not a valid option; setting mode = 'cov'
> pcaCorr<-phyl.pca(tree,X,mode="correlation")
> pca
$Eval
PC1 PC2 PC3 PC4
PC1 5.612471 0.000000 0.000000 0.000000e+00
PC2 ....
$Evec
PC1 PC2 PC3 PC4
[1,] -0.2026469 0.2849556 0.1557432 0.92383906
[2,] -0.5305595 0.6617452 0.3666632 -0.38230631
[3,] -0.4710700 -0.6810988 0.5603986 0.01227893
[4,] 0.6749325 0.1303764 0.7261237 -0.01457757
$S
PC1 PC2 PC3 PC4
t1 -24.731226 -0.357717 13.5570211 0.038474777
t2 ....
> pcaCorr
$Eval
PC1 PC2 PC3 PC4
PC1 2.363718 0.000000 0.0000000 0.000000e+00
PC2 ....
$Evec
PC1 PC2 PC3 PC4
[1,] 0.5945738 0.3526378 0.1810304 0.69954028
[2,] 0.6090435 0.3035288 0.1661745 -0.71366821
[3,] 0.2450079 -0.7643208 0.5960410 0.02280308
[4,] -0.4642405 0.4464644 0.7644274 -0.02830375
$S
PC1 PC2 PC3 PC4
t1 16.705476 -3.6663012 7.3051355 0.040978169
t2 ....
That's it. Thanks to the phytools user for his comment. Keep them coming!
Hi Liam,
ReplyDeleteI try to run the phyl.pca function, but I keep on getting this error message:
"Error in solve.default(kronRC, y - D %*% a) :
system is computationally singular: reciprocal condition number = 3.02577e-028"
Can't figure out what is wrong with the data. Do you have any idea? It does run the demo, so I assume my input files are somehow wrong. I have a tree in nexus format, and a trait file input in csv format (no missing data), with names linked to the traits (data <- traits[,2:7]
rownames(data) <- traits$name).
Hope you can help?
Renske
Hi Renske.
DeleteDoes your try have terminal branch lengths that are zero? That's one thing that will cause singularity of vcv.phylo(tree), which means the phylogenetic PCA cannot be computed. If not that, maybe send me data & tree and I would be happy to investigate further.
- Liam
Hi Liam,
ReplyDeleteI'm having the same issue with "Error in solve.default(kronRC, y - D %*% a) :
system is computationally singular: reciprocal condition number = 1.90653e-23"
I don't know what is going on. I have run phyl.pca before with no problems. I went to the data and there is no zeros or missing information. Any idea of other possibilities for this error?
Thanks
Jose
Are there zero-length terminal edges in the tree? This makes vcv(tree) singular & can cause this error. The other possibility is that the covariance matrix among traits is singular, which happens if there are more columns (variables) than rows (species) in your input data matrix; or if some columns are perfect linear combinations of other variables.
DeleteIn at least the first case, one workaround suggested to me Eric Stone at NCSU, is to use the pseudoinverse:
require(corpcor)
solve<-function(x) pseudoinverse(x)
pca<-phyl.pca(...)
For this to work, though, you need to load phyl.pca & all internally called functions from source. This can be done using:
source("http://www.phytools.org/utilities/v4.6/utilities.R")
source("http://www.phytools.org/phyl.pca/v0.5/phyl.pca.R")
Let me know if this works.
All the best, Liam
Thank you Liam, I'll try it and let you know.
DeleteCheers
Jose
Hi Liam,
DeleteUnfortunately I keep on getting the same type of error. I don't have any zero length edges in the tree or more columns than rows in the data matrix.
The data are partial warps from a geometric morphometric analysis. If I use that matrix and do a normal PCA I get identical results as from relative warps. I have no idea what may be the problem with the phyl.pca. Would you be willing to have a look at my data, please. Maybe I'm missing something.
Cheers
Jose
Hi Liam,
ReplyDeleteI have the same problem. I don't have zero length,but I have a lot more variables than species: dim(df)
6 163
Is there a workaround for this situation as well?
All the best and thank you for providing this great package!
Daniel
This comes over a year later, but i had similar issues with phyl.pca,and found that, like liam suggested, this issue can arise when some variables are perfect linear combinations of other variables - this may be true of some climatic variables.
ReplyDeleteTrawling through some other forums I found a solution that by adding a random small number (0.00001 - 0.00009) to your data will remove the perfect linear relationship between variables without changing the underlying data such that the results will be altered significantly.
The other option is removing these variables altogether if you know (or can find out) which ones they are.
Alex
Hi Liam,
ReplyDeleteI have the same problem like older comments:
> snockpcalambda<-phyl.pca(Tree_snocks, matrizlambda, method="lambda", mode="cor")
Error in solve.default(kronRC, y - D %*% a) :
system is computationally singular: reciprocal condition number = 1.90357e-24
However, it happens when I try to use the method lambda, when I use BM it runs with no problems.
Regards,
Nati