A phytools user asked a question today about the function for phylogenetic
PCA, phyl.pca
, as follows:
“When running phyl.pca it estimates a very small value for λ
(7.023741e05) for my dataset, considering that I also run a PCA using the
princomp
function for the sake of comparison. The score values
under phyl.pca and princomp are basically identical, but when it comes to the
loading values there are significant differences.”
One's instinct might be to suspect that this may have something to do with
the phylogeny (even though with λ=0 the phylogeny has essentially
been ignored), or some problem with implementation in phyl.pca

and under normal circumstances these might be completely reasonable
suspicions.
However, in this case the reality is somewhat simpler. It turns out that
the values that are reported as 'loadings' by princomp
are not
loadings at all*  but eigenvectors! [*Of course, as this is essentially
an issue of semantics, one could argue that convention is irrelevant &
one should be able to call whatever one sees fit a loading  but that is
beyond the scope of this post.] In phyl.pca
, by contrast,
loadings are the (phylogenetic) correlations between the principal components
and the original variables.
Here, let me should you what I mean. For the following exercise I will use
an
updated
version of phyl.pca
, but the only modification is that this
one permits the user to supply a value of λ rather than estimating the
ML or REML value of λ via numerical optimazation.
library(phytools)
tree<pbtree(n=26,scale=1,tip.label=LETTERS)
tree
##
## Phylogenetic tree with 26 tips and 25 internal nodes.
##
## Tip labels:
## A, B, C, D, E, F, ...
##
## Rooted; includes branch lengths.
X<fastBM(tree,nsim=5)
X
## [,1] [,2] [,3] [,4] [,5]
## A 0.419211564 0.07072903 0.67729412 1.4275547 0.8956189
## B 0.423491819 0.07230281 0.04353057 1.8236536 1.5941026
## C 0.674572520 0.97438998 0.44582192 1.2064721 1.1968349
## D 0.144589136 0.78968124 0.23939251 0.1604712 1.2397129
## E 0.037492764 0.33373486 1.21897435 0.1363836 1.2665389
## F 0.049170578 0.24004146 1.25930976 0.4204259 1.4010466
## G 0.270753737 0.16233552 1.44227716 0.6901865 1.1583580
## H 0.307947940 0.17681354 0.80860301 0.4294457 0.3390226
## I 0.121981063 1.54597206 0.60546618 0.6466339 0.9344554
## J 0.420320783 0.21857516 0.14983841 0.3039046 1.1689236
## K 0.250202949 0.15091852 0.26843245 1.0483096 1.4964830
## L 0.470691820 1.64133733 0.01950930 0.9664008 1.6606181
## M 0.859279349 0.79613244 0.21614994 0.6497791 0.9240529
## N 0.531125300 0.83481129 0.84552758 0.8766522 1.6286559
## O 0.362032884 0.48093998 0.35350781 0.1044871 1.0196886
## P 0.271775777 0.11416429 0.04792096 0.5975930 1.5266678
## Q 0.241609452 0.15047775 0.10962659 0.4692869 1.4156464
## R 0.250219313 0.19964343 0.70567779 0.8044968 1.4153869
## S 0.652031092 0.57999496 0.41286288 0.2249271 1.5921390
## T 0.259519957 0.52699138 0.47710290 0.4022593 1.5436005
## U 1.355538164 0.97106032 0.90405187 0.9945186 2.4721401
## V 1.490160952 0.73566840 0.01132505 0.6147261 2.8499558
## W 0.778375058 0.59260693 0.13285445 0.7414536 3.3113367
## X 0.542971500 0.73316942 1.09947325 2.6681790 3.1622164
## Y 0.259842325 0.62505613 1.01765120 1.3903978 3.3034612
## Z 0.001442499 0.56993438 0.88314656 1.4145177 2.7669227
pPCA<phyl.pca(tree,X,method="lambda",opt="fixed",lambda=0)
PCA<princomp(X)
pPCA
## Phylogenetic pca
## Standard deviations:
## PC1 PC2 PC3 PC4 PC5
## 1.4478179 1.0234962 0.6017127 0.5160807 0.4481210
## Loads:
## PC1 PC2 PC3 PC4 PC5
## [1,] 0.06844119 0.01938318 0.607431464 0.78769136 0.07422559
## [2,] 0.69994804 0.30961766 0.032431852 0.08830584 0.63667881
## [3,] 0.23741151 0.62539675 0.645314608 0.34237575 0.13734114
## [4,] 0.26739801 0.91432777 0.243703565 0.15816580 0.08997325
## [5,] 0.97499062 0.20554298 0.007102939 0.02361245 0.08085409
## lambda:
## [1] 0
PCA
## Call:
## princomp(x = X)
##
## Standard deviations:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## 1.4197022 1.0036205 0.5900279 0.5060587 0.4394188
##
## 5 variables and 26 observations.
Superficially, these 'loadings' seem to be quite difference, even though
by setting λ to zero (on an ultrametric phylogeny) we are
essentially telling phyl.pca
to do a regular, rather than
phylogenetic, principal components analysis. However, if we dig just a bit
deeper we'll see that the values reported as Loads
in the
print
method of princomp
are actually the
eigenvectors of our principal components orthogonalization.
pPCA$Evec
## PC1 PC2 PC3 PC4 PC5
## [1,] 0.02571749 0.0103030 0.54920315 0.83035424 0.09011214
## [2,] 0.31363671 0.1962522 0.03496694 0.11100610 0.92172233
## [3,] 0.11357645 0.4232236 0.74281860 0.45950037 0.21227840
## [4,] 0.17370824 0.8402178 0.38093324 0.28825123 0.18884013
## [5,] 0.92622700 0.2762151 0.01623603 0.06292952 0.24816334
unclass(PCA$loadings)
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## [1,] 0.02571749 0.0103030 0.54920315 0.83035424 0.09011214
## [2,] 0.31363671 0.1962522 0.03496694 0.11100610 0.92172233
## [3,] 0.11357645 0.4232236 0.74281860 0.45950037 0.21227840
## [4,] 0.17370824 0.8402178 0.38093324 0.28825123 0.18884013
## [5,] 0.92622700 0.2762151 0.01623603 0.06292952 0.24816334
Note that some of our vectors may have flipped signs between the two analyses. (Of course, as these are vectors the signs are arbitrary.) To compare the values between our two analyses, I will reset the signs so that the highest magnitude coefficient of each column is positive:
foo<function(x) sign(x[which(abs(x)==max(abs(x)))])
plot(matrix(rep(1,5),5,1)%*%apply(pPCA$Evec,2,foo)*pPCA$Evec,
matrix(rep(1,5),5,1)%*%apply(PCA$loadings,2,foo)*PCA$loadings,
cex=1.5,pch=21,bg="grey",xlab=
expression(paste("pPCA eigenvector coefficients (",lambda,"= 0)")),
ylab="PCA \"loadings\" from princomp")
Note that these will not be equivalent except when we have set λ to zero & our tree is ultrametric!
That's all I have to say about this.
No comments:
Post a Comment