In response to a user request I have
updated
the phytools function phyl.pca
to permit more columns (variables)
than rows (species) in the input data matrix.
This is because though (for centred principal components analysis) it is one will always explain 100% of the variance with no more than N-1 principal components, it is possible to extract up to N-1 principal components for cases in which the input data matrix has more than N-1 columns.
The updates are not complicated - mostly I just compute min(n-1,m)
,
for n
species and m
variables and then tell R to
return only up to this quantity of principal components.
I also added an S3 plot
method for objects of class
"phyl.pca"
(to existing biplot
, print
,
and summary
methods), which just returns a screeplot.
Here is an example in which I have a 10 taxon tree but 20 variables:
library(phytools) ## update from GitHub
packageVersion("phytools")
## [1] '0.5.44'
tree
##
## Phylogenetic tree with 10 tips and 9 internal nodes.
##
## Tip labels:
## A, B, C, D, E, F, ...
##
## Rooted; includes branch lengths.
X
## [,1] [,2] [,3] [,4] [,5] [,6]
## A 2.871213715 4.6747464 -0.40290368 -0.08741276 4.704137 -4.2622250
## B 1.042267133 3.8249301 0.65568345 -0.02805061 2.370947 -5.8588521
## C 0.004174414 -0.2874374 0.54262328 3.19664103 9.271929 -0.3564986
## D 2.976758314 1.8380804 -1.68282413 0.93237549 2.420743 2.5573102
## E 5.634111352 -4.7423357 -0.31085428 -2.50014841 5.786365 -0.2223702
## F 3.768230129 -4.4532704 -0.04018276 -1.82465733 5.547573 -0.2856634
## G 7.784903402 -0.7683049 -1.17327169 -0.34961072 4.334423 1.1144112
## H 1.643271999 -0.2554700 -0.49057157 -2.25651725 8.497341 -0.2871782
## I 3.923290527 0.2753995 -0.32539131 0.34383657 8.186258 0.8573116
## J 1.454268527 -6.0386162 -1.59563258 -2.28504513 -7.585861 3.1770481
## [,7] [,8] [,9] [,10] [,11] [,12]
## A 2.2477129 -1.6305340 1.8336705 -3.740515 -0.4025394 1.6017655
## B 2.1139692 -2.0213911 1.5040234 -2.910061 -0.8443350 3.1554476
## C 0.4520337 -1.0324254 -1.4895363 2.943384 3.6746002 2.1617860
## D 3.6463932 -1.7557318 0.8743329 1.663234 -1.3072798 -1.6035540
## E 0.3164341 6.8518210 -4.5161806 2.880459 3.3953585 -1.3370474
## F -1.3811630 6.7095858 -3.5541550 1.987218 2.2563118 -1.1472472
## G -0.7910001 3.7591620 -2.5568415 2.714351 2.4158234 1.3566271
## H -0.6734680 -0.4502042 -1.4317151 4.383923 4.1363432 0.9287238
## I -1.7199976 2.4663262 -2.0276715 3.273081 5.2843480 1.9080612
## J -2.3350235 1.0385213 -0.2371657 -1.755852 0.9089970 -6.8150985
## [,13] [,14] [,15] [,16] [,17] [,18]
## A -5.701040 7.3491364 4.6841996 -1.3305419 3.0704336 -5.471046
## B -9.124058 6.5838899 5.2174013 -2.7701790 1.1816007 -7.078690
## C -1.641395 0.7479561 4.1746009 0.3383891 0.5286383 -1.133129
## D -5.595636 4.8799468 2.6389452 1.0069082 -1.5137721 2.278655
## E -1.711472 7.5584553 -1.4630973 2.1636300 2.1653965 2.779019
## F -1.692711 4.9012769 -0.5412248 1.4585657 2.0301627 3.025220
## G -5.444037 9.6659874 -1.1024915 3.2488660 3.2064625 3.391553
## H -6.614349 2.8992088 1.5448734 2.0531905 1.9588168 1.510531
## I -9.444524 5.4065346 1.6282638 1.8551760 1.2510615 3.383527
## J 1.752094 -1.6821038 -2.3981097 -0.1649102 -2.1881646 4.575223
## [,19] [,20]
## A -2.8584588 -2.834560
## B -2.8935271 -2.134582
## C -0.8122405 -2.809604
## D 0.8635701 2.014487
## E -1.3748665 -3.749462
## F -0.4462016 -4.051456
## G -2.7122182 -3.882143
## H 1.4088890 -2.451189
## I -0.0321507 -2.765148
## J 2.3180704 4.245889
pca<-phyl.pca(tree,X)
summary(pca)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5
## Standard deviation 6.1460367 5.5834580 4.6570712 3.5560098 3.03714064
## Proportion of Variance 0.3075852 0.2538527 0.1766042 0.1029677 0.07511125
## Cumulative Proportion 0.3075852 0.5614379 0.7380421 0.8410098 0.91612103
## PC6 PC7 PC8 PC9
## Standard deviation 2.10969194 1.62588269 1.4186935 1.092693580
## Proportion of Variance 0.03624209 0.02152552 0.0163890 0.009722366
## Cumulative Proportion 0.95236312 0.97388864 0.9902776 1.000000000
plot(pca)
biplot(pca)
Cool. You get the idea.
FYI, the data for this demo were simulated as follows:
library(clusterGeneration)
## simulate tree
tree<-pbtree(n=10,tip.label=LETTERS[1:10])
## simulate random positive definite covariance matrix
V<-genPositiveDefMat(20,covMethod="unifcorrmat")$Sigma
## simulate trait data
X<-sim.corrs(tree,vcv=V)
No comments:
Post a Comment
Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.