A couple of days I received the following inquiry about phylogenetic PCA as implemented
in the phytools function phyl.pca
:
“Also, another less pressing question I had was what to do about multiple samples from one species. I think the phylo.pca doesn't like 0 branch lengths (I think because of the inverted matrix error), so I've been putting in "placeholder” small branch lengths and/or averaging individuals of the same species. I'm not sure if this was more acceptable or if there was a more standard practice, though.“
Firstly, one should virtually never include interspecific data by adding terminal branches of zero (or very short) length to the tree. I won't get into why here - just don't do it please.
It is possible, however, to compute PC scores for one set of data based on a rotation computed for another set. With respect to the phylogenetic PCA, what I would thus recommend it to compute the rotation using the species means data & then if one wants to obtain scores for individuals, use this rotation to compute those scores.
I previously documented how to do this
here,
but today I also added a new S3 method called scores( )
that automates this
calculation for a "phyl.pca"
option. I just
pushed
this update to GitHub so it can be installed using devtools in the typical way.
Here's a quick demo of how it works:
library(phytools)
## latest phytools version from GitHub
packageVersion("phytools")
## [1] '0.7.3'
tree
##
## Phylogenetic tree with 26 tips and 25 internal nodes.
##
## Tip labels:
## Y, K, R, O, J, W, ...
##
## Rooted; includes branch lengths.
## individual data
Xi
## [,1] [,2] [,3] [,4] [,5]
## A1 1.20787154 0.47914789 0.8623047 1.2495477 -1.42947991
## A2 2.12320487 0.47402360 0.9550751 1.6972576 -1.91713104
## B1 -3.36948268 3.77947415 -0.8104234 5.5267960 -4.22373306
## B2 -3.89240291 3.54207211 -1.3588802 5.1915551 -3.65014197
## B3 -3.63879657 3.37713991 -1.9685069 4.3689452 -4.14255235
## B4 -4.17859299 3.73085941 -1.3349099 5.3970606 -3.86395132
## B5 -2.77632771 3.40620475 -2.0374127 5.4595384 -3.94879047
## C1 2.66252966 -1.69501406 5.0620859 1.4238734 -1.00437382
## C2 2.27931674 -2.71852958 4.8953938 1.0881464 -1.39655915
## C3 3.19271894 -1.67392645 4.3500289 2.0463243 -1.44478241
## C4 1.93553298 -2.86843955 4.8429547 1.6322762 -1.23128013
## C5 2.41170225 -2.05714667 4.7793417 2.0967727 -0.59099047
## D1 -3.06618045 -0.77675201 0.7938031 1.5116715 0.94196412
## D2 -3.36196611 -0.80817106 0.7938168 0.9432299 1.13672644
## E1 0.59350436 1.38165922 -0.3316782 1.9536058 -2.77096246
## E2 0.27218954 0.56687414 0.3876198 1.8826100 -1.69035681
## E3 -0.52332659 0.59848482 0.2504620 2.5939968 -1.65883260
## F1 0.22273476 -2.45198070 2.7746535 0.3861395 1.05274671
## G1 0.29489467 -2.73762657 2.6821865 -0.8493884 1.69457218
## G2 -0.65288642 -3.54374917 2.1397764 -1.0217453 0.94721876
## G3 -0.97408415 -3.32511230 2.7265237 -1.1669954 1.91906697
## G4 -0.02536859 -2.11640698 2.8554528 -1.2779744 1.76777639
## H1 -4.01209732 -0.23426385 -2.0376439 -0.3891521 1.00892916
## H2 -3.15960646 0.08868799 -2.0180995 -0.7439570 0.79702482
## H3 -3.70773942 0.14920207 -2.3082605 -1.0694013 -0.10941989
## H4 -3.46345662 0.22065346 -1.5988715 -1.0741026 0.11413459
## H5 -3.04286821 -0.24156083 -1.2535244 -1.1774794 1.38878052
## I1 0.99431344 -0.39255396 2.7078109 3.0684490 -2.19073296
## J1 1.48012052 -3.89868107 5.3744037 0.4072786 -1.20200149
## J2 0.48200711 -3.35309585 5.2460605 1.0674284 -1.45483287
## J3 1.83140764 -2.74096320 5.3074037 0.3876488 -1.27536042
## J4 1.11622382 -4.10303090 5.5272531 1.2245890 -1.04267349
## K1 -0.59879568 -2.64773568 3.8750676 0.8384714 0.27605077
## K2 0.21024899 -2.11642761 3.6165256 2.7382009 0.69900866
## K3 -1.09273604 -2.74959089 4.3711782 0.8696402 0.64300187
## K4 -0.04152818 -2.61839939 3.7803252 1.6808116 0.29258721
## K5 -0.72210266 -2.33429710 4.1279836 2.4090130 0.78003183
## L1 0.03509973 -2.05695368 1.9254794 -0.2567052 0.68410460
## L2 -1.05658416 -0.88037564 2.6933563 0.5511140 0.86919585
## L3 -0.51878917 -1.93693089 1.6358342 0.3728830 2.03744417
## L4 -0.67541145 -1.70725799 1.2117356 0.2004287 0.01558763
## L5 -1.02782819 -1.63031446 1.0975700 -0.4001964 1.63772015
## M1 2.20607975 -1.60221108 1.4350309 -1.0468959 0.24758727
## M2 2.62837719 -1.04389879 1.3517971 0.5392242 1.07560283
## N1 -2.93057050 -2.17509303 2.6040320 4.7347668 0.15589588
## N2 -3.15345861 -2.59937773 3.3869134 4.2462922 1.39983864
## N3 -3.60232496 -2.71874020 2.8319589 4.4270392 0.24840489
## N4 -4.25563290 -2.64746706 2.8833009 3.6301153 -0.27392402
## O1 -5.28683088 -3.70228410 2.3844570 1.3629796 2.39517794
## O2 -5.38104846 -4.87728898 2.8560162 2.0700145 1.50657055
## O3 -5.48787982 -4.25493642 2.8122042 1.3717118 2.28957850
## P1 3.09231576 -5.14787727 5.6295546 -0.8423818 1.52657614
## Q1 -3.82118671 -3.11733681 4.6756523 5.0033811 0.87620972
## Q2 -4.23022707 -4.21706371 3.7660017 5.0281017 0.93682444
## R1 -2.36400839 -2.27834062 1.9794830 1.4314134 0.60331716
## R2 -2.46014028 -1.08918230 1.6520861 0.9182345 -0.96725226
## R3 -2.73368752 -1.43893280 2.0945741 2.2804404 -0.69332985
## R4 -1.08796774 -2.09122908 1.1532129 1.8427559 0.13805477
## R5 -1.89238447 -2.26101902 1.9490731 1.3904670 -0.78965765
## S1 -3.12227407 -1.54462933 1.4331181 -0.3039058 1.86762871
## S2 -3.54825239 -1.57142641 0.2298638 1.6937560 1.14127352
## S3 -2.68354538 -1.00599847 0.5909795 0.9248922 1.83179640
## S4 -2.87604229 -1.72251468 0.3321336 1.0708315 1.71451830
## T1 -2.57118182 -1.90281553 1.4933224 3.0562073 -2.09442201
## U1 0.66466684 1.65714974 1.0003654 3.9438689 -0.86037147
## U2 0.45740087 2.04635918 1.3612159 4.2285696 -0.80167998
## U3 1.14681687 1.59757388 0.8111320 3.4097557 -1.34731730
## U4 0.52640220 1.64986943 1.7871073 4.3412065 -2.19397881
## V1 0.75658437 -1.87706996 4.4759921 1.5414931 -1.62431717
## W1 2.68267252 -3.19689644 7.4425853 3.7556724 -1.68887877
## W2 2.76196919 -3.74195183 6.1999013 2.9743391 -1.78427181
## X1 0.91234712 -0.72483560 3.6801507 4.4381717 -2.46478786
## X2 -0.05238740 -0.80465206 3.2210595 2.7922503 -3.38940095
## X3 0.45835800 -0.53257849 3.5476527 3.3640370 -2.73478436
## X4 1.49751687 -0.73809794 3.9058654 4.1334758 -3.66075267
## X5 0.28602440 -0.49840370 2.7803620 4.1733508 -2.10614716
## Y1 1.25534713 -5.10787364 4.3453981 -2.7789615 2.23696428
## Y2 3.18711033 -4.93714442 4.0112931 -2.4233436 2.50171151
## Y3 1.57399176 -4.46222484 4.3241397 -2.0585551 2.36389952
## Z1 2.57286210 -2.39561802 5.4534820 2.4485488 -1.16858447
## Z2 3.40395933 -2.58146747 5.9747592 0.9585838 -1.53438227
## Z3 2.55456063 -1.98275084 6.0612324 1.5865925 -2.48843615
## Z4 2.87982996 -0.97677109 5.7028841 1.2531958 -1.32016016
## Z5 2.60403104 -2.01564983 5.3533706 1.4698986 -1.86993584
## species means
Xm<-matrix(NA,Ntip(tree),ncol(Xi),
dimnames=list(tree$tip.label,
colnames(Xi)))
for(i in 1:Ntip(tree)){
ii<-grep(tree$tip.label[i],rownames(Xi))
Xm[i,]<-colMeans(Xi[ii,,drop=FALSE])
}
Xm
## [,1] [,2] [,3] [,4] [,5]
## Y 2.0054831 -4.835747633 4.2269436 -2.42028676 2.3675251
## K -0.4489827 -2.493290133 3.9542160 1.70722742 0.5381361
## R -2.1076377 -1.831740764 1.7656858 1.57266223 -0.3417736
## O -5.3852531 -4.278169835 2.6842258 1.60156865 2.0637757
## J 1.2274398 -3.523942753 5.3637802 0.77173620 -1.2437171
## W 2.7223209 -3.469424136 6.8212433 3.36500574 -1.7365753
## P 3.0923158 -5.147877265 5.6295546 -0.84238176 1.5265761
## N -3.4854967 -2.535169504 2.9265513 4.25955340 0.3825538
## Q -4.0257069 -3.667200260 4.2208270 5.01574141 0.9065171
## G -0.3393611 -2.930723754 2.6009849 -1.07902587 1.5821586
## L -0.6487026 -1.642366534 1.7127951 0.09350482 1.0488105
## X 0.6203718 -0.659713558 3.4270181 3.78025711 -2.8711746
## A 1.6655382 0.476585745 0.9086899 1.47340267 -1.6733055
## S -3.0575285 -1.461142224 0.6465237 0.84639345 1.6388042
## D -3.2140733 -0.792461533 0.7938099 1.22745074 1.0393453
## H -3.4771536 -0.003456233 -1.8432800 -0.89081847 0.6398898
## E 0.1141224 0.849006057 0.1021345 2.14340420 -2.0400506
## U 0.6988217 1.737738058 1.2399551 3.98085017 -1.3008369
## I 0.9943134 -0.392553964 2.7078109 3.06844904 -2.1907330
## B -3.5711206 3.567150066 -1.5020266 5.18877905 -3.9658338
## Z 2.8030486 -1.990451450 5.7091456 1.54336389 -1.6762998
## F 0.2227348 -2.451980697 2.7746535 0.38613948 1.0527467
## M 2.4172285 -1.323054933 1.3934140 -0.25383584 0.6615951
## T -2.5711818 -1.902815534 1.4933224 3.05620725 -2.0944220
## V 0.7565844 -1.877069964 4.4759921 1.54149311 -1.6243172
## C 2.4963601 -2.202611261 4.7859610 1.65747860 -1.1335972
## phylogenetic PCA with species means
pca<-phyl.pca(tree,Xm)
## scores for species means
scores(pca)
## PC1 PC2 PC3 PC4 PC5
## Y 4.8355633 -0.3294605 4.9204846 -1.214253071 0.48417330
## K 1.1271032 -1.8905319 1.1887193 0.146415414 0.52679413
## R -1.3776735 -0.5685501 1.0264633 -0.240509269 -0.23031699
## O -3.0957196 -3.2959470 4.5902049 0.079157313 0.17733263
## J 3.4199678 -2.3877114 0.5020696 -1.548803070 -0.50276096
## W 4.4388588 -4.0399357 -1.8032739 -1.402434823 0.98914395
## P 5.8491031 -1.8008209 3.2645258 -1.562917383 1.10626368
## N -2.6273140 -3.3123513 0.6706146 0.405660969 1.13010196
## Q -2.4989968 -5.2056862 1.2006453 0.378034447 1.59048227
## G 1.5317465 0.1939072 3.6138101 -0.224632080 -0.07356055
## L 0.3058007 0.7250770 2.1747536 0.317350313 0.15884578
## X 0.6098092 -1.3901896 -3.2212485 -0.563561025 0.04254944
## A 0.8665509 2.2910213 -1.9286367 -0.130411521 0.12191960
## S -2.3410551 0.4294088 2.8283221 0.915182583 0.25544998
## D -2.5983693 0.3485202 1.9828177 1.179837056 -0.11900397
## H -3.5720692 3.4015096 2.7538485 0.649728549 -1.19057929
## E -1.0349547 2.1898149 -2.1836836 0.006607482 -0.14215189
## U -0.5987227 1.1359716 -3.3898069 1.467907338 1.06531799
## I 0.7875657 -0.3033870 -2.6109169 -0.309751715 0.26272155
## B -6.0691018 1.7853416 -5.1025947 1.320105136 -1.02195377
## Z 4.4303034 -1.8337284 -1.3996878 -0.735809021 -0.14132283
## F 1.4969275 -0.1922748 2.0672492 -0.093322338 0.59191066
## M 2.5702312 2.2961278 1.0451950 -0.281481952 0.96142293
## T -2.5011852 -1.2398761 -0.6973528 -1.171601072 -0.35424555
## V 2.1926843 -1.5966486 -0.7306491 -0.654874153 -0.59622447
## C 3.7181929 -1.3800392 -0.8978589 -0.808252464 0.43215101
## scores for individuals
scores(pca,newdata=Xi)
## PC1 PC2 PC3 PC4 PC5
## A1 0.58137458 2.26833022 -1.50992755 0.05254930 -0.05285582
## A2 1.15172725 2.31371230 -2.34734580 -0.31337234 0.29669502
## B1 -5.69194457 1.27868306 -5.63929102 1.53355448 -1.13271412
## B2 -6.21729020 1.56489018 -4.80403093 1.57346089 -1.01085811
## B3 -6.09486112 2.37746282 -4.62585228 0.88981672 -1.46686067
## B4 -6.52217871 1.42718979 -5.04909125 1.66474417 -1.17136154
## B5 -5.81923440 2.27848201 -5.39470803 0.93894942 -0.32797441
## C1 4.04819458 -1.22316584 -0.98201155 -0.31625536 0.20825572
## C2 3.81380896 -1.50051504 -0.44363895 -1.31721174 -0.04694296
## C3 3.83684934 -0.78430503 -1.74656305 -0.80025429 0.76853764
## C4 3.35984901 -1.86986239 -0.46865739 -1.25673076 0.29950120
## C5 3.53226258 -1.52234773 -0.84842353 -0.35081017 0.93140344
## D1 -2.58766894 0.28198061 1.71060049 1.13387065 0.05883451
## D2 -2.60906967 0.41505973 2.25503481 1.22580347 -0.29684245
## E1 -0.91950745 2.94693733 -2.90222653 -0.22396817 -0.48265722
## E2 -0.63006722 2.04596141 -1.75443025 0.03041785 -0.05889452
## E3 -1.55528957 1.57654590 -1.89439396 0.21337277 0.11509608
## F1 1.49692754 -0.19227481 2.06724918 -0.09332234 0.59191066
## G1 1.98319119 0.33592552 3.26622081 -0.07905380 0.30794692
## G2 1.01866478 0.14291709 3.58606835 -1.12319680 -0.21212503
## G3 1.18807446 -0.22799820 4.24271622 -0.19775773 -0.12889419
## G4 1.93705568 0.52478448 3.36023520 0.50148001 -0.26116988
## H1 -4.21463202 3.06152517 2.98057354 0.74927758 -0.77191141
## H2 -3.46355581 3.61819193 2.63532881 0.71375796 -0.86001124
## H3 -4.00177798 3.77949585 2.41281338 0.23074439 -1.70697275
## H4 -3.42305167 3.37912219 2.41521653 0.58097673 -1.71788761
## H5 -2.75732844 3.16921261 3.32531008 0.97388609 -0.89611342
## I1 0.78756568 -0.30338695 -2.61091689 -0.30975171 0.26272155
## J1 3.77009981 -2.29606193 0.82355733 -1.87217506 -0.50930671
## J2 2.65643636 -2.62752255 0.36504020 -1.43004519 -0.74223991
## J3 3.93220043 -1.66531301 0.15517195 -1.12167970 -0.70897156
## J4 3.32113450 -2.96194826 0.66450880 -1.77131234 -0.05052566
## K1 1.25159512 -1.57925153 1.63321774 -0.17957517 -0.15125090
## K2 1.09822059 -1.70247634 0.34879191 0.38728142 1.51661199
## K3 1.15682815 -2.16043574 2.02139355 0.18036386 -0.24669436
## K4 1.34821280 -1.66402160 0.98964668 -0.20557961 0.62257941
## K5 0.78065922 -2.34647404 0.95054657 0.54958657 0.89272452
## L1 1.06414744 0.78680684 2.11315269 -0.27273027 0.06153136
## L2 0.29615528 -0.04188865 1.54620281 1.18255466 -0.30825301
## L3 0.36272587 0.61785755 2.72579932 0.62588492 1.02569981
## L4 -0.08817270 0.98616467 1.53609670 -0.44725043 -0.18624600
## L5 -0.10585249 1.27644467 2.95251641 0.49829269 0.20149675
## M1 2.68309527 2.41206799 1.42442617 -0.73468540 0.22034228
## M2 2.45736721 2.18018762 0.66596382 0.17172150 1.70250357
## N1 -2.56948782 -2.95017073 -0.05817857 0.39753741 1.52137482
## N2 -2.04353648 -3.52769425 1.21271646 1.00911592 1.69237269
## N3 -2.82187513 -3.43066649 0.61806938 0.19573987 1.19061494
## N4 -3.07435651 -3.34087382 0.90985106 0.02025067 0.11604538
## O1 -3.11192122 -2.70534277 4.64768619 0.57463847 0.18529130
## O2 -3.16038176 -3.87005819 4.24883516 -0.59826113 0.27313704
## O3 -3.01485587 -3.31243993 4.87409349 0.26109460 0.07356954
## P1 5.84910311 -1.80082091 3.26452583 -1.56291738 1.10626368
## Q1 -2.13304010 -5.24928996 0.85558856 0.87580561 1.36810917
## Q2 -2.86495354 -5.16208245 1.54570204 -0.11973672 1.81285536
## R1 -1.31674630 -0.90476921 1.96308744 0.04823315 0.15262143
## R2 -1.58422603 -0.05391324 0.78230593 -0.05693333 -1.28256572
## R3 -1.97457519 -1.18864009 0.41523442 0.11054128 -0.43355407
## R4 -0.95479129 0.03372347 0.99863380 -0.49640797 0.85483205
## R5 -1.05802876 -0.72915132 0.97305468 -0.80797947 -0.44291866
## S1 -1.56525324 0.29512563 3.63825338 1.14104633 -0.56828429
## S2 -3.25461074 0.14469106 2.26743678 0.57200356 0.46551160
## S3 -2.12348163 0.75013909 2.58217027 1.29353192 0.45436537
## S4 -2.42087490 0.52767940 2.82542790 0.65414852 0.67020723
## T1 -2.50118518 -1.23987611 -0.69735278 -1.17160107 -0.35424555
## U1 -0.70258767 1.29596288 -3.03986714 1.58536349 1.36163333
## U2 -0.78716461 0.98955203 -3.29104221 2.07335134 1.27860604
## U3 -0.28356663 1.79696944 -3.15636432 1.09327885 1.02709589
## U4 -0.62157182 0.46140212 -4.07195378 1.11963568 0.59393669
## V1 2.19268431 -1.59664865 -0.73064913 -0.65487415 -0.59622447
## W1 4.59323403 -4.57051267 -2.13032675 -0.95022639 0.98692254
## W2 4.28448359 -3.50935871 -1.47622110 -1.85464325 0.99136536
## X1 0.78289693 -1.77122312 -3.41011353 -0.30871842 0.73233760
## X2 0.28395177 -1.12040455 -2.70465518 -0.98695764 -1.02311779
## X3 0.69020005 -1.30387401 -2.91689514 -0.35978315 -0.27484074
## X4 1.37014904 -1.63790746 -4.16308027 -1.03919153 0.05889125
## X5 -0.07815197 -1.11753896 -2.91149857 -0.12315438 0.71947689
## Y1 4.44779172 -0.63479302 5.39272325 -1.36041582 -0.07126708
## Y2 5.65042965 0.20766608 4.69310049 -1.47307135 1.11650405
## Y3 4.40846844 -0.56125449 4.67562993 -0.80927204 0.40728292
## Z1 3.87645446 -2.25958453 -1.32575109 -0.72808203 0.78927693
## Z2 5.27985339 -1.79508555 -0.91340898 -1.16277358 -0.10879099
## Z3 4.35122084 -2.21607210 -1.86618529 -1.03597318 -0.76345277
## Z4 4.54419334 -1.27276565 -1.50322641 0.18997248 -0.36284858
## Z5 4.09979495 -1.62513421 -1.38986713 -0.94218880 -0.26079876
Cool. An interesting property of these data is that when we use the rotation from the species means the mean PC scores from the individual data are the same as the original PC scores from the means:
pca<-phyl.pca(tree,Xm)
S<-scores(pca,newdata=Xi)
Sa<-matrix(NA,26,5)
rownames(Sa)<-LETTERS
for(i in 1:26){
ii<-grep(LETTERS[i],rownames(S))
Sa[i,]<-if(length(ii)>1) colMeans(S[ii,]) else S[ii,]
}
plot(Sa,scores(pca)[rownames(Sa),],cex=1.5,pch=19,
col=make.transparent("darkblue",0.5),
xlab="mean scores",ylab="scores from means")
The data for this exercise were simulated using the packages clusterGeneration & phytools as follows:
library(clusterGeneration)
tree<-pbtree(n=26,scale=1,tip.label=sample(LETTERS))
k<-5
V<-genPositiveDefMat(k,covMethod="unifcorrmat")$Sigma
X<-sim.corrs(phytools:::lambdaTree(tree,0.7),V)[LETTERS,]
N<-sample(1:5,nrow(X),replace=TRUE)
Xi<-apply(X,2,sampleFrom,xvar=0.2,n=N)
spp<-as.factor(rownames(Xi))
rownames(Xi)<-paste(rownames(Xi),
unlist(sapply(N,function(n) 1:n)),
sep="")
That's it.
Hi Liam,
ReplyDeleteTo replicate this process with our own data, the labels of individuals is essential, right?
I need individuals of the taxa Homo to be labelled Homo1, Homo2, Homo3, right?
And can we do this for only a subset of taxa and not all in our tree?
Thanks!
Hi Borja. No, the labels don't matter at all. You use the species means data to get the rotation, but then you can rotate any data using that rotation.
DeleteThis comment has been removed by the author.
ReplyDelete