Wednesday, August 21, 2019

Computing principal components scores for new data with phyl.pca

A couple of days I received the following inquiry about phylogenetic PCA as implemented in the phytools function phyl.pca:

“Also, another less pressing question I had was what to do about multiple samples from one species. I think the phylo.pca doesn't like 0 branch lengths (I think because of the inverted matrix error), so I've been putting in "placeholder” small branch lengths and/or averaging individuals of the same species. I'm not sure if this was more acceptable or if there was a more standard practice, though.“

Firstly, one should virtually never include interspecific data by adding terminal branches of zero (or very short) length to the tree. I won't get into why here - just don't do it please.

It is possible, however, to compute PC scores for one set of data based on a rotation computed for another set. With respect to the phylogenetic PCA, what I would thus recommend it to compute the rotation using the species means data & then if one wants to obtain scores for individuals, use this rotation to compute those scores.

I previously documented how to do this here, but today I also added a new S3 method called scores( ) that automates this calculation for a "phyl.pca" option. I just pushed this update to GitHub so it can be installed using devtools in the typical way.

Here's a quick demo of how it works:

library(phytools)
## latest phytools version from GitHub
packageVersion("phytools") 
## [1] '0.7.3'
tree
## 
## Phylogenetic tree with 26 tips and 25 internal nodes.
## 
## Tip labels:
##  Y, K, R, O, J, W, ...
## 
## Rooted; includes branch lengths.
## individual data
Xi
##           [,1]        [,2]       [,3]       [,4]        [,5]
## A1  1.20787154  0.47914789  0.8623047  1.2495477 -1.42947991
## A2  2.12320487  0.47402360  0.9550751  1.6972576 -1.91713104
## B1 -3.36948268  3.77947415 -0.8104234  5.5267960 -4.22373306
## B2 -3.89240291  3.54207211 -1.3588802  5.1915551 -3.65014197
## B3 -3.63879657  3.37713991 -1.9685069  4.3689452 -4.14255235
## B4 -4.17859299  3.73085941 -1.3349099  5.3970606 -3.86395132
## B5 -2.77632771  3.40620475 -2.0374127  5.4595384 -3.94879047
## C1  2.66252966 -1.69501406  5.0620859  1.4238734 -1.00437382
## C2  2.27931674 -2.71852958  4.8953938  1.0881464 -1.39655915
## C3  3.19271894 -1.67392645  4.3500289  2.0463243 -1.44478241
## C4  1.93553298 -2.86843955  4.8429547  1.6322762 -1.23128013
## C5  2.41170225 -2.05714667  4.7793417  2.0967727 -0.59099047
## D1 -3.06618045 -0.77675201  0.7938031  1.5116715  0.94196412
## D2 -3.36196611 -0.80817106  0.7938168  0.9432299  1.13672644
## E1  0.59350436  1.38165922 -0.3316782  1.9536058 -2.77096246
## E2  0.27218954  0.56687414  0.3876198  1.8826100 -1.69035681
## E3 -0.52332659  0.59848482  0.2504620  2.5939968 -1.65883260
## F1  0.22273476 -2.45198070  2.7746535  0.3861395  1.05274671
## G1  0.29489467 -2.73762657  2.6821865 -0.8493884  1.69457218
## G2 -0.65288642 -3.54374917  2.1397764 -1.0217453  0.94721876
## G3 -0.97408415 -3.32511230  2.7265237 -1.1669954  1.91906697
## G4 -0.02536859 -2.11640698  2.8554528 -1.2779744  1.76777639
## H1 -4.01209732 -0.23426385 -2.0376439 -0.3891521  1.00892916
## H2 -3.15960646  0.08868799 -2.0180995 -0.7439570  0.79702482
## H3 -3.70773942  0.14920207 -2.3082605 -1.0694013 -0.10941989
## H4 -3.46345662  0.22065346 -1.5988715 -1.0741026  0.11413459
## H5 -3.04286821 -0.24156083 -1.2535244 -1.1774794  1.38878052
## I1  0.99431344 -0.39255396  2.7078109  3.0684490 -2.19073296
## J1  1.48012052 -3.89868107  5.3744037  0.4072786 -1.20200149
## J2  0.48200711 -3.35309585  5.2460605  1.0674284 -1.45483287
## J3  1.83140764 -2.74096320  5.3074037  0.3876488 -1.27536042
## J4  1.11622382 -4.10303090  5.5272531  1.2245890 -1.04267349
## K1 -0.59879568 -2.64773568  3.8750676  0.8384714  0.27605077
## K2  0.21024899 -2.11642761  3.6165256  2.7382009  0.69900866
## K3 -1.09273604 -2.74959089  4.3711782  0.8696402  0.64300187
## K4 -0.04152818 -2.61839939  3.7803252  1.6808116  0.29258721
## K5 -0.72210266 -2.33429710  4.1279836  2.4090130  0.78003183
## L1  0.03509973 -2.05695368  1.9254794 -0.2567052  0.68410460
## L2 -1.05658416 -0.88037564  2.6933563  0.5511140  0.86919585
## L3 -0.51878917 -1.93693089  1.6358342  0.3728830  2.03744417
## L4 -0.67541145 -1.70725799  1.2117356  0.2004287  0.01558763
## L5 -1.02782819 -1.63031446  1.0975700 -0.4001964  1.63772015
## M1  2.20607975 -1.60221108  1.4350309 -1.0468959  0.24758727
## M2  2.62837719 -1.04389879  1.3517971  0.5392242  1.07560283
## N1 -2.93057050 -2.17509303  2.6040320  4.7347668  0.15589588
## N2 -3.15345861 -2.59937773  3.3869134  4.2462922  1.39983864
## N3 -3.60232496 -2.71874020  2.8319589  4.4270392  0.24840489
## N4 -4.25563290 -2.64746706  2.8833009  3.6301153 -0.27392402
## O1 -5.28683088 -3.70228410  2.3844570  1.3629796  2.39517794
## O2 -5.38104846 -4.87728898  2.8560162  2.0700145  1.50657055
## O3 -5.48787982 -4.25493642  2.8122042  1.3717118  2.28957850
## P1  3.09231576 -5.14787727  5.6295546 -0.8423818  1.52657614
## Q1 -3.82118671 -3.11733681  4.6756523  5.0033811  0.87620972
## Q2 -4.23022707 -4.21706371  3.7660017  5.0281017  0.93682444
## R1 -2.36400839 -2.27834062  1.9794830  1.4314134  0.60331716
## R2 -2.46014028 -1.08918230  1.6520861  0.9182345 -0.96725226
## R3 -2.73368752 -1.43893280  2.0945741  2.2804404 -0.69332985
## R4 -1.08796774 -2.09122908  1.1532129  1.8427559  0.13805477
## R5 -1.89238447 -2.26101902  1.9490731  1.3904670 -0.78965765
## S1 -3.12227407 -1.54462933  1.4331181 -0.3039058  1.86762871
## S2 -3.54825239 -1.57142641  0.2298638  1.6937560  1.14127352
## S3 -2.68354538 -1.00599847  0.5909795  0.9248922  1.83179640
## S4 -2.87604229 -1.72251468  0.3321336  1.0708315  1.71451830
## T1 -2.57118182 -1.90281553  1.4933224  3.0562073 -2.09442201
## U1  0.66466684  1.65714974  1.0003654  3.9438689 -0.86037147
## U2  0.45740087  2.04635918  1.3612159  4.2285696 -0.80167998
## U3  1.14681687  1.59757388  0.8111320  3.4097557 -1.34731730
## U4  0.52640220  1.64986943  1.7871073  4.3412065 -2.19397881
## V1  0.75658437 -1.87706996  4.4759921  1.5414931 -1.62431717
## W1  2.68267252 -3.19689644  7.4425853  3.7556724 -1.68887877
## W2  2.76196919 -3.74195183  6.1999013  2.9743391 -1.78427181
## X1  0.91234712 -0.72483560  3.6801507  4.4381717 -2.46478786
## X2 -0.05238740 -0.80465206  3.2210595  2.7922503 -3.38940095
## X3  0.45835800 -0.53257849  3.5476527  3.3640370 -2.73478436
## X4  1.49751687 -0.73809794  3.9058654  4.1334758 -3.66075267
## X5  0.28602440 -0.49840370  2.7803620  4.1733508 -2.10614716
## Y1  1.25534713 -5.10787364  4.3453981 -2.7789615  2.23696428
## Y2  3.18711033 -4.93714442  4.0112931 -2.4233436  2.50171151
## Y3  1.57399176 -4.46222484  4.3241397 -2.0585551  2.36389952
## Z1  2.57286210 -2.39561802  5.4534820  2.4485488 -1.16858447
## Z2  3.40395933 -2.58146747  5.9747592  0.9585838 -1.53438227
## Z3  2.55456063 -1.98275084  6.0612324  1.5865925 -2.48843615
## Z4  2.87982996 -0.97677109  5.7028841  1.2531958 -1.32016016
## Z5  2.60403104 -2.01564983  5.3533706  1.4698986 -1.86993584
## species means
Xm<-matrix(NA,Ntip(tree),ncol(Xi),
    dimnames=list(tree$tip.label,
    colnames(Xi)))
for(i in 1:Ntip(tree)){
    ii<-grep(tree$tip.label[i],rownames(Xi))
    Xm[i,]<-colMeans(Xi[ii,,drop=FALSE])
}
Xm
##         [,1]         [,2]       [,3]        [,4]       [,5]
## Y  2.0054831 -4.835747633  4.2269436 -2.42028676  2.3675251
## K -0.4489827 -2.493290133  3.9542160  1.70722742  0.5381361
## R -2.1076377 -1.831740764  1.7656858  1.57266223 -0.3417736
## O -5.3852531 -4.278169835  2.6842258  1.60156865  2.0637757
## J  1.2274398 -3.523942753  5.3637802  0.77173620 -1.2437171
## W  2.7223209 -3.469424136  6.8212433  3.36500574 -1.7365753
## P  3.0923158 -5.147877265  5.6295546 -0.84238176  1.5265761
## N -3.4854967 -2.535169504  2.9265513  4.25955340  0.3825538
## Q -4.0257069 -3.667200260  4.2208270  5.01574141  0.9065171
## G -0.3393611 -2.930723754  2.6009849 -1.07902587  1.5821586
## L -0.6487026 -1.642366534  1.7127951  0.09350482  1.0488105
## X  0.6203718 -0.659713558  3.4270181  3.78025711 -2.8711746
## A  1.6655382  0.476585745  0.9086899  1.47340267 -1.6733055
## S -3.0575285 -1.461142224  0.6465237  0.84639345  1.6388042
## D -3.2140733 -0.792461533  0.7938099  1.22745074  1.0393453
## H -3.4771536 -0.003456233 -1.8432800 -0.89081847  0.6398898
## E  0.1141224  0.849006057  0.1021345  2.14340420 -2.0400506
## U  0.6988217  1.737738058  1.2399551  3.98085017 -1.3008369
## I  0.9943134 -0.392553964  2.7078109  3.06844904 -2.1907330
## B -3.5711206  3.567150066 -1.5020266  5.18877905 -3.9658338
## Z  2.8030486 -1.990451450  5.7091456  1.54336389 -1.6762998
## F  0.2227348 -2.451980697  2.7746535  0.38613948  1.0527467
## M  2.4172285 -1.323054933  1.3934140 -0.25383584  0.6615951
## T -2.5711818 -1.902815534  1.4933224  3.05620725 -2.0944220
## V  0.7565844 -1.877069964  4.4759921  1.54149311 -1.6243172
## C  2.4963601 -2.202611261  4.7859610  1.65747860 -1.1335972
## phylogenetic PCA with species means
pca<-phyl.pca(tree,Xm)
## scores for species means
scores(pca)
##          PC1        PC2        PC3          PC4         PC5
## Y  4.8355633 -0.3294605  4.9204846 -1.214253071  0.48417330
## K  1.1271032 -1.8905319  1.1887193  0.146415414  0.52679413
## R -1.3776735 -0.5685501  1.0264633 -0.240509269 -0.23031699
## O -3.0957196 -3.2959470  4.5902049  0.079157313  0.17733263
## J  3.4199678 -2.3877114  0.5020696 -1.548803070 -0.50276096
## W  4.4388588 -4.0399357 -1.8032739 -1.402434823  0.98914395
## P  5.8491031 -1.8008209  3.2645258 -1.562917383  1.10626368
## N -2.6273140 -3.3123513  0.6706146  0.405660969  1.13010196
## Q -2.4989968 -5.2056862  1.2006453  0.378034447  1.59048227
## G  1.5317465  0.1939072  3.6138101 -0.224632080 -0.07356055
## L  0.3058007  0.7250770  2.1747536  0.317350313  0.15884578
## X  0.6098092 -1.3901896 -3.2212485 -0.563561025  0.04254944
## A  0.8665509  2.2910213 -1.9286367 -0.130411521  0.12191960
## S -2.3410551  0.4294088  2.8283221  0.915182583  0.25544998
## D -2.5983693  0.3485202  1.9828177  1.179837056 -0.11900397
## H -3.5720692  3.4015096  2.7538485  0.649728549 -1.19057929
## E -1.0349547  2.1898149 -2.1836836  0.006607482 -0.14215189
## U -0.5987227  1.1359716 -3.3898069  1.467907338  1.06531799
## I  0.7875657 -0.3033870 -2.6109169 -0.309751715  0.26272155
## B -6.0691018  1.7853416 -5.1025947  1.320105136 -1.02195377
## Z  4.4303034 -1.8337284 -1.3996878 -0.735809021 -0.14132283
## F  1.4969275 -0.1922748  2.0672492 -0.093322338  0.59191066
## M  2.5702312  2.2961278  1.0451950 -0.281481952  0.96142293
## T -2.5011852 -1.2398761 -0.6973528 -1.171601072 -0.35424555
## V  2.1926843 -1.5966486 -0.7306491 -0.654874153 -0.59622447
## C  3.7181929 -1.3800392 -0.8978589 -0.808252464  0.43215101
## scores for individuals
scores(pca,newdata=Xi)
##            PC1         PC2         PC3         PC4         PC5
## A1  0.58137458  2.26833022 -1.50992755  0.05254930 -0.05285582
## A2  1.15172725  2.31371230 -2.34734580 -0.31337234  0.29669502
## B1 -5.69194457  1.27868306 -5.63929102  1.53355448 -1.13271412
## B2 -6.21729020  1.56489018 -4.80403093  1.57346089 -1.01085811
## B3 -6.09486112  2.37746282 -4.62585228  0.88981672 -1.46686067
## B4 -6.52217871  1.42718979 -5.04909125  1.66474417 -1.17136154
## B5 -5.81923440  2.27848201 -5.39470803  0.93894942 -0.32797441
## C1  4.04819458 -1.22316584 -0.98201155 -0.31625536  0.20825572
## C2  3.81380896 -1.50051504 -0.44363895 -1.31721174 -0.04694296
## C3  3.83684934 -0.78430503 -1.74656305 -0.80025429  0.76853764
## C4  3.35984901 -1.86986239 -0.46865739 -1.25673076  0.29950120
## C5  3.53226258 -1.52234773 -0.84842353 -0.35081017  0.93140344
## D1 -2.58766894  0.28198061  1.71060049  1.13387065  0.05883451
## D2 -2.60906967  0.41505973  2.25503481  1.22580347 -0.29684245
## E1 -0.91950745  2.94693733 -2.90222653 -0.22396817 -0.48265722
## E2 -0.63006722  2.04596141 -1.75443025  0.03041785 -0.05889452
## E3 -1.55528957  1.57654590 -1.89439396  0.21337277  0.11509608
## F1  1.49692754 -0.19227481  2.06724918 -0.09332234  0.59191066
## G1  1.98319119  0.33592552  3.26622081 -0.07905380  0.30794692
## G2  1.01866478  0.14291709  3.58606835 -1.12319680 -0.21212503
## G3  1.18807446 -0.22799820  4.24271622 -0.19775773 -0.12889419
## G4  1.93705568  0.52478448  3.36023520  0.50148001 -0.26116988
## H1 -4.21463202  3.06152517  2.98057354  0.74927758 -0.77191141
## H2 -3.46355581  3.61819193  2.63532881  0.71375796 -0.86001124
## H3 -4.00177798  3.77949585  2.41281338  0.23074439 -1.70697275
## H4 -3.42305167  3.37912219  2.41521653  0.58097673 -1.71788761
## H5 -2.75732844  3.16921261  3.32531008  0.97388609 -0.89611342
## I1  0.78756568 -0.30338695 -2.61091689 -0.30975171  0.26272155
## J1  3.77009981 -2.29606193  0.82355733 -1.87217506 -0.50930671
## J2  2.65643636 -2.62752255  0.36504020 -1.43004519 -0.74223991
## J3  3.93220043 -1.66531301  0.15517195 -1.12167970 -0.70897156
## J4  3.32113450 -2.96194826  0.66450880 -1.77131234 -0.05052566
## K1  1.25159512 -1.57925153  1.63321774 -0.17957517 -0.15125090
## K2  1.09822059 -1.70247634  0.34879191  0.38728142  1.51661199
## K3  1.15682815 -2.16043574  2.02139355  0.18036386 -0.24669436
## K4  1.34821280 -1.66402160  0.98964668 -0.20557961  0.62257941
## K5  0.78065922 -2.34647404  0.95054657  0.54958657  0.89272452
## L1  1.06414744  0.78680684  2.11315269 -0.27273027  0.06153136
## L2  0.29615528 -0.04188865  1.54620281  1.18255466 -0.30825301
## L3  0.36272587  0.61785755  2.72579932  0.62588492  1.02569981
## L4 -0.08817270  0.98616467  1.53609670 -0.44725043 -0.18624600
## L5 -0.10585249  1.27644467  2.95251641  0.49829269  0.20149675
## M1  2.68309527  2.41206799  1.42442617 -0.73468540  0.22034228
## M2  2.45736721  2.18018762  0.66596382  0.17172150  1.70250357
## N1 -2.56948782 -2.95017073 -0.05817857  0.39753741  1.52137482
## N2 -2.04353648 -3.52769425  1.21271646  1.00911592  1.69237269
## N3 -2.82187513 -3.43066649  0.61806938  0.19573987  1.19061494
## N4 -3.07435651 -3.34087382  0.90985106  0.02025067  0.11604538
## O1 -3.11192122 -2.70534277  4.64768619  0.57463847  0.18529130
## O2 -3.16038176 -3.87005819  4.24883516 -0.59826113  0.27313704
## O3 -3.01485587 -3.31243993  4.87409349  0.26109460  0.07356954
## P1  5.84910311 -1.80082091  3.26452583 -1.56291738  1.10626368
## Q1 -2.13304010 -5.24928996  0.85558856  0.87580561  1.36810917
## Q2 -2.86495354 -5.16208245  1.54570204 -0.11973672  1.81285536
## R1 -1.31674630 -0.90476921  1.96308744  0.04823315  0.15262143
## R2 -1.58422603 -0.05391324  0.78230593 -0.05693333 -1.28256572
## R3 -1.97457519 -1.18864009  0.41523442  0.11054128 -0.43355407
## R4 -0.95479129  0.03372347  0.99863380 -0.49640797  0.85483205
## R5 -1.05802876 -0.72915132  0.97305468 -0.80797947 -0.44291866
## S1 -1.56525324  0.29512563  3.63825338  1.14104633 -0.56828429
## S2 -3.25461074  0.14469106  2.26743678  0.57200356  0.46551160
## S3 -2.12348163  0.75013909  2.58217027  1.29353192  0.45436537
## S4 -2.42087490  0.52767940  2.82542790  0.65414852  0.67020723
## T1 -2.50118518 -1.23987611 -0.69735278 -1.17160107 -0.35424555
## U1 -0.70258767  1.29596288 -3.03986714  1.58536349  1.36163333
## U2 -0.78716461  0.98955203 -3.29104221  2.07335134  1.27860604
## U3 -0.28356663  1.79696944 -3.15636432  1.09327885  1.02709589
## U4 -0.62157182  0.46140212 -4.07195378  1.11963568  0.59393669
## V1  2.19268431 -1.59664865 -0.73064913 -0.65487415 -0.59622447
## W1  4.59323403 -4.57051267 -2.13032675 -0.95022639  0.98692254
## W2  4.28448359 -3.50935871 -1.47622110 -1.85464325  0.99136536
## X1  0.78289693 -1.77122312 -3.41011353 -0.30871842  0.73233760
## X2  0.28395177 -1.12040455 -2.70465518 -0.98695764 -1.02311779
## X3  0.69020005 -1.30387401 -2.91689514 -0.35978315 -0.27484074
## X4  1.37014904 -1.63790746 -4.16308027 -1.03919153  0.05889125
## X5 -0.07815197 -1.11753896 -2.91149857 -0.12315438  0.71947689
## Y1  4.44779172 -0.63479302  5.39272325 -1.36041582 -0.07126708
## Y2  5.65042965  0.20766608  4.69310049 -1.47307135  1.11650405
## Y3  4.40846844 -0.56125449  4.67562993 -0.80927204  0.40728292
## Z1  3.87645446 -2.25958453 -1.32575109 -0.72808203  0.78927693
## Z2  5.27985339 -1.79508555 -0.91340898 -1.16277358 -0.10879099
## Z3  4.35122084 -2.21607210 -1.86618529 -1.03597318 -0.76345277
## Z4  4.54419334 -1.27276565 -1.50322641  0.18997248 -0.36284858
## Z5  4.09979495 -1.62513421 -1.38986713 -0.94218880 -0.26079876

Cool. An interesting property of these data is that when we use the rotation from the species means the mean PC scores from the individual data are the same as the original PC scores from the means:

pca<-phyl.pca(tree,Xm)
S<-scores(pca,newdata=Xi)
Sa<-matrix(NA,26,5)
rownames(Sa)<-LETTERS
for(i in 1:26){
    ii<-grep(LETTERS[i],rownames(S))
    Sa[i,]<-if(length(ii)>1) colMeans(S[ii,]) else S[ii,]
}
plot(Sa,scores(pca)[rownames(Sa),],cex=1.5,pch=19,
    col=make.transparent("darkblue",0.5),
    xlab="mean scores",ylab="scores from means")

plot of chunk unnamed-chunk-2

The data for this exercise were simulated using the packages clusterGeneration & phytools as follows:

library(clusterGeneration)
tree<-pbtree(n=26,scale=1,tip.label=sample(LETTERS))
k<-5
V<-genPositiveDefMat(k,covMethod="unifcorrmat")$Sigma
X<-sim.corrs(phytools:::lambdaTree(tree,0.7),V)[LETTERS,]
N<-sample(1:5,nrow(X),replace=TRUE)
Xi<-apply(X,2,sampleFrom,xvar=0.2,n=N)
spp<-as.factor(rownames(Xi))
rownames(Xi)<-paste(rownames(Xi),
    unlist(sapply(N,function(n) 1:n)),
    sep="")

That's it.

3 comments:

  1. Hi Liam,

    To replicate this process with our own data, the labels of individuals is essential, right?

    I need individuals of the taxa Homo to be labelled Homo1, Homo2, Homo3, right?

    And can we do this for only a subset of taxa and not all in our tree?

    Thanks!

    ReplyDelete
    Replies
    1. Hi Borja. No, the labels don't matter at all. You use the species means data to get the rotation, but then you can rotate any data using that rotation.

      Delete
  2. This comment has been removed by the author.

    ReplyDelete

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.