In response to a user request I have
updated
the phytools function `phyl.pca`

to permit more columns (variables)
than rows (species) in the input data matrix.

This is because though (for centred principal components analysis) it is
one will always explain 100% of the variance with no more than *N*-1
principal components, it is possible to extract up to *N*-1 principal
components for cases in which the input data matrix has more than
*N*-1 columns.

The updates are not complicated - mostly I just compute `min(n-1,m)`

,
for `n`

species and `m`

variables and then tell R to
return only up to this quantity of principal components.

I also added an S3 `plot`

method for objects of class
`"phyl.pca"`

(to existing `biplot`

, `print`

,
and `summary`

methods), which just returns a screeplot.

Here is an example in which I have a 10 taxon tree but 20 variables:

```
library(phytools) ## update from GitHub
packageVersion("phytools")
```

```
## [1] '0.5.44'
```

```
tree
```

```
##
## Phylogenetic tree with 10 tips and 9 internal nodes.
##
## Tip labels:
## A, B, C, D, E, F, ...
##
## Rooted; includes branch lengths.
```

```
X
```

```
## [,1] [,2] [,3] [,4] [,5] [,6]
## A 2.871213715 4.6747464 -0.40290368 -0.08741276 4.704137 -4.2622250
## B 1.042267133 3.8249301 0.65568345 -0.02805061 2.370947 -5.8588521
## C 0.004174414 -0.2874374 0.54262328 3.19664103 9.271929 -0.3564986
## D 2.976758314 1.8380804 -1.68282413 0.93237549 2.420743 2.5573102
## E 5.634111352 -4.7423357 -0.31085428 -2.50014841 5.786365 -0.2223702
## F 3.768230129 -4.4532704 -0.04018276 -1.82465733 5.547573 -0.2856634
## G 7.784903402 -0.7683049 -1.17327169 -0.34961072 4.334423 1.1144112
## H 1.643271999 -0.2554700 -0.49057157 -2.25651725 8.497341 -0.2871782
## I 3.923290527 0.2753995 -0.32539131 0.34383657 8.186258 0.8573116
## J 1.454268527 -6.0386162 -1.59563258 -2.28504513 -7.585861 3.1770481
## [,7] [,8] [,9] [,10] [,11] [,12]
## A 2.2477129 -1.6305340 1.8336705 -3.740515 -0.4025394 1.6017655
## B 2.1139692 -2.0213911 1.5040234 -2.910061 -0.8443350 3.1554476
## C 0.4520337 -1.0324254 -1.4895363 2.943384 3.6746002 2.1617860
## D 3.6463932 -1.7557318 0.8743329 1.663234 -1.3072798 -1.6035540
## E 0.3164341 6.8518210 -4.5161806 2.880459 3.3953585 -1.3370474
## F -1.3811630 6.7095858 -3.5541550 1.987218 2.2563118 -1.1472472
## G -0.7910001 3.7591620 -2.5568415 2.714351 2.4158234 1.3566271
## H -0.6734680 -0.4502042 -1.4317151 4.383923 4.1363432 0.9287238
## I -1.7199976 2.4663262 -2.0276715 3.273081 5.2843480 1.9080612
## J -2.3350235 1.0385213 -0.2371657 -1.755852 0.9089970 -6.8150985
## [,13] [,14] [,15] [,16] [,17] [,18]
## A -5.701040 7.3491364 4.6841996 -1.3305419 3.0704336 -5.471046
## B -9.124058 6.5838899 5.2174013 -2.7701790 1.1816007 -7.078690
## C -1.641395 0.7479561 4.1746009 0.3383891 0.5286383 -1.133129
## D -5.595636 4.8799468 2.6389452 1.0069082 -1.5137721 2.278655
## E -1.711472 7.5584553 -1.4630973 2.1636300 2.1653965 2.779019
## F -1.692711 4.9012769 -0.5412248 1.4585657 2.0301627 3.025220
## G -5.444037 9.6659874 -1.1024915 3.2488660 3.2064625 3.391553
## H -6.614349 2.8992088 1.5448734 2.0531905 1.9588168 1.510531
## I -9.444524 5.4065346 1.6282638 1.8551760 1.2510615 3.383527
## J 1.752094 -1.6821038 -2.3981097 -0.1649102 -2.1881646 4.575223
## [,19] [,20]
## A -2.8584588 -2.834560
## B -2.8935271 -2.134582
## C -0.8122405 -2.809604
## D 0.8635701 2.014487
## E -1.3748665 -3.749462
## F -0.4462016 -4.051456
## G -2.7122182 -3.882143
## H 1.4088890 -2.451189
## I -0.0321507 -2.765148
## J 2.3180704 4.245889
```

```
pca<-phyl.pca(tree,X)
summary(pca)
```

```
## Importance of components:
## PC1 PC2 PC3 PC4 PC5
## Standard deviation 6.1460367 5.5834580 4.6570712 3.5560098 3.03714064
## Proportion of Variance 0.3075852 0.2538527 0.1766042 0.1029677 0.07511125
## Cumulative Proportion 0.3075852 0.5614379 0.7380421 0.8410098 0.91612103
## PC6 PC7 PC8 PC9
## Standard deviation 2.10969194 1.62588269 1.4186935 1.092693580
## Proportion of Variance 0.03624209 0.02152552 0.0163890 0.009722366
## Cumulative Proportion 0.95236312 0.97388864 0.9902776 1.000000000
```

```
plot(pca)
```

```
biplot(pca)
```

Cool. You get the idea.

FYI, the data for this demo were simulated as follows:

```
library(clusterGeneration)
## simulate tree
tree<-pbtree(n=10,tip.label=LETTERS[1:10])
## simulate random positive definite covariance matrix
V<-genPositiveDefMat(20,covMethod="unifcorrmat")$Sigma
## simulate trait data
X<-sim.corrs(tree,vcv=V)
```

## No comments:

## Post a Comment