Tuesday, March 14, 2017

Loadings in phyl.pca vs. "loadings" in princomp

A phytools user asked a question today about the function for phylogenetic PCA, phyl.pca, as follows:

“When running phyl.pca it estimates a very small value for λ (7.023741e-05) for my dataset, considering that I also run a PCA using the princomp function for the sake of comparison. The score values under phyl.pca and princomp are basically identical, but when it comes to the loading values there are significant differences.”

One's instinct might be to suspect that this may have something to do with the phylogeny (even though with λ=0 the phylogeny has essentially been ignored), or some problem with implementation in phyl.pca - and under normal circumstances these might be completely reasonable suspicions.

However, in this case the reality is somewhat simpler. It turns out that the values that are reported as 'loadings' by princomp are not loadings at all* - but eigenvectors! [*Of course, as this is essentially an issue of semantics, one could argue that convention is irrelevant & one should be able to call whatever one sees fit a loading - but that is beyond the scope of this post.] In phyl.pca, by contrast, loadings are the (phylogenetic) correlations between the principal components and the original variables.

Here, let me should you what I mean. For the following exercise I will use an updated version of phyl.pca, but the only modification is that this one permits the user to supply a value of λ rather than estimating the ML or REML value of λ via numerical optimazation.

library(phytools)
tree<-pbtree(n=26,scale=1,tip.label=LETTERS)
tree
## 
## Phylogenetic tree with 26 tips and 25 internal nodes.
## 
## Tip labels:
##  A, B, C, D, E, F, ...
## 
## Rooted; includes branch lengths.
X<-fastBM(tree,nsim=5)
X
##           [,1]        [,2]        [,3]       [,4]       [,5]
## A  0.419211564 -0.07072903 -0.67729412 -1.4275547 -0.8956189
## B  0.423491819 -0.07230281  0.04353057 -1.8236536 -1.5941026
## C  0.674572520 -0.97438998 -0.44582192 -1.2064721 -1.1968349
## D  0.144589136 -0.78968124  0.23939251 -0.1604712  1.2397129
## E  0.037492764 -0.33373486 -1.21897435  0.1363836  1.2665389
## F -0.049170578 -0.24004146 -1.25930976 -0.4204259  1.4010466
## G  0.270753737 -0.16233552 -1.44227716 -0.6901865  1.1583580
## H  0.307947940 -0.17681354  0.80860301  0.4294457  0.3390226
## I  0.121981063 -1.54597206 -0.60546618 -0.6466339  0.9344554
## J -0.420320783  0.21857516 -0.14983841  0.3039046  1.1689236
## K -0.250202949 -0.15091852 -0.26843245  1.0483096  1.4964830
## L  0.470691820 -1.64133733  0.01950930  0.9664008 -1.6606181
## M -0.859279349 -0.79613244  0.21614994 -0.6497791  0.9240529
## N -0.531125300 -0.83481129  0.84552758 -0.8766522  1.6286559
## O -0.362032884 -0.48093998  0.35350781  0.1044871  1.0196886
## P -0.271775777  0.11416429 -0.04792096  0.5975930  1.5266678
## Q -0.241609452  0.15047775  0.10962659  0.4692869  1.4156464
## R  0.250219313 -0.19964343  0.70567779 -0.8044968  1.4153869
## S  0.652031092 -0.57999496  0.41286288  0.2249271  1.5921390
## T  0.259519957 -0.52699138  0.47710290  0.4022593  1.5436005
## U  1.355538164 -0.97106032  0.90405187  0.9945186  2.4721401
## V  1.490160952  0.73566840  0.01132505 -0.6147261  2.8499558
## W  0.778375058  0.59260693 -0.13285445 -0.7414536  3.3113367
## X  0.542971500  0.73316942 -1.09947325 -2.6681790  3.1622164
## Y -0.259842325  0.62505613 -1.01765120 -1.3903978  3.3034612
## Z  0.001442499  0.56993438 -0.88314656 -1.4145177  2.7669227
pPCA<-phyl.pca(tree,X,method="lambda",opt="fixed",lambda=0)
PCA<-princomp(X)
pPCA
## Phylogenetic pca
## Standard deviations:
##       PC1       PC2       PC3       PC4       PC5 
## 1.4478179 1.0234962 0.6017127 0.5160807 0.4481210 
## Loads:
##              PC1         PC2          PC3         PC4         PC5
## [1,]  0.06844119  0.01938318 -0.607431464  0.78769136  0.07422559
## [2,]  0.69994804  0.30961766  0.032431852  0.08830584 -0.63667881
## [3,] -0.23741151 -0.62539675 -0.645314608 -0.34237575 -0.13734114
## [4,] -0.26739801 -0.91432777  0.243703565  0.15816580 -0.08997325
## [5,]  0.97499062 -0.20554298 -0.007102939 -0.02361245  0.08085409
## lambda:
## [1] 0
PCA
## Call:
## princomp(x = X)
## 
## Standard deviations:
##    Comp.1    Comp.2    Comp.3    Comp.4    Comp.5 
## 1.4197022 1.0036205 0.5900279 0.5060587 0.4394188 
## 
##  5  variables and  26 observations.

Superficially, these 'loadings' seem to be quite difference, even though by setting λ to zero (on an ultrametric phylogeny) we are essentially telling phyl.pca to do a regular, rather than phylogenetic, principal components analysis. However, if we dig just a bit deeper we'll see that the values reported as Loads in the print method of princomp are actually the eigenvectors of our principal components orthogonalization.

pPCA$Evec
##              PC1        PC2         PC3         PC4         PC5
## [1,]  0.02571749  0.0103030 -0.54920315  0.83035424  0.09011214
## [2,]  0.31363671  0.1962522  0.03496694  0.11100610 -0.92172233
## [3,] -0.11357645 -0.4232236 -0.74281860 -0.45950037 -0.21227840
## [4,] -0.17370824 -0.8402178  0.38093324  0.28825123 -0.18884013
## [5,]  0.92622700 -0.2762151 -0.01623603 -0.06292952  0.24816334
unclass(PCA$loadings)
##           Comp.1     Comp.2      Comp.3      Comp.4      Comp.5
## [1,] -0.02571749  0.0103030 -0.54920315  0.83035424  0.09011214
## [2,] -0.31363671  0.1962522  0.03496694  0.11100610 -0.92172233
## [3,]  0.11357645 -0.4232236 -0.74281860 -0.45950037 -0.21227840
## [4,]  0.17370824 -0.8402178  0.38093324  0.28825123 -0.18884013
## [5,] -0.92622700 -0.2762151 -0.01623603 -0.06292952  0.24816334

Note that some of our vectors may have flipped signs between the two analyses. (Of course, as these are vectors the signs are arbitrary.) To compare the values between our two analyses, I will reset the signs so that the highest magnitude coefficient of each column is positive:

foo<-function(x) sign(x[which(abs(x)==max(abs(x)))])
plot(matrix(rep(1,5),5,1)%*%apply(pPCA$Evec,2,foo)*pPCA$Evec,
    matrix(rep(1,5),5,1)%*%apply(PCA$loadings,2,foo)*PCA$loadings,
    cex=1.5,pch=21,bg="grey",xlab=
    expression(paste("pPCA eigenvector coefficients (",lambda,"= 0)")),
    ylab="PCA \"loadings\" from princomp")

plot of chunk unnamed-chunk-3

Note that these will not be equivalent except when we have set λ to zero & our tree is ultrametric!

That's all I have to say about this.

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.