This is a demo to illustrate how to compute principal components scores from phylogenetic PCA on data separate from those used in the PCA itself.
A circumstance in which we might want to compute scores like this would be if we had undertaken phylogenetic PCA with species means, but then wanted to compute PC scores, using the phyl.pca
rotation, on all our individual observations.
Since all we’re doing is applying a rotation obtained from one analysis to a different set of data, however, it’s not restricted to that use case. For example, we might have conducted interspecies phylogenetic PCA using species means of wild Canidae – but then desire to apply that rotation to the same variables measured for a set of domestic dog breeds.
This demo will mostly work using the current CRAN version of phytools, but I added one very small update to the scores
method, so it is best to update phytools from GitHub if following along at home.
library(phytools)
packageVersion("phytools")
## [1] '1.9.3'
For this example, I will use real interspecific species means data for Anolis lizards from Mahler et al. (2010).
data(anoletree)
data(anole.data)
anole_pca<-phyl.pca(anoletree,anole.data)
anole_pca
## Phylogenetic pca
## Standard deviations:
## PC1 PC2 PC3 PC4 PC5 PC6
## 0.33222579 0.09207406 0.05012082 0.04318123 0.02008655 0.01507125
## Loads:
## PC1 PC2 PC3 PC4 PC5 PC6
## SVL -0.9712234 0.16067288 0.01972570 0.14782215 -0.06211906 0.06935433
## HL -0.9645111 0.16955087 -0.01203113 0.17994634 0.08064324 -0.04406887
## HLL -0.9814164 -0.02674808 0.10315533 -0.13790763 0.06887922 0.04126248
## FLL -0.9712265 0.17585694 0.10697935 -0.09105747 -0.06075142 -0.04864769
## LAM -0.7810052 0.37429334 -0.47398703 -0.15871456 0.00217418 0.00875408
## TL -0.9014509 -0.42528918 -0.07614571 0.01709649 -0.01750404 -0.01088743
Here’s a phylomorphospace projection of just the two PC axes from this analysis.
cols<-setNames(palette()[6:1],sort(unique(getStates(anoletree))))
par(mar=c(5.1,4.1,0.6,0.6))
phylomorphospace(anoletree,scores(anole_pca,dim=c(1,2)),
ftype="off",node.by.map=TRUE,bty="n",colors=cols,
node.size=c(0,1.2))
grid()
legend("topleft",names(cols),pch=21,pt.bg=cols,horiz=TRUE,
bty="n",pt.cex=1.5)
Unfortunately, I don’t have the individual-level data for this study, so to approximate what those might look like, I intend to simulate them using the phytools function sampleFrom
. I’ll simulate 600 observations in total, with varied sample size among species.
nsamples<-600
xvar<-0.01*apply(anole.data,2,var)
X<-matrix(NA,nsamples,ncol(anole.data))
spp<-vector()
for(i in 1:nsamples){
spp[i]<-sample(rownames(anole.data),1)
xbar<-t(anole.data)[,spp[i]]
X[i,]<-sampleFrom(xbar,xvar,n=rep(1,length(xbar)))
}
dimnames(X)<-list(spp,names(xbar))
X<-X[order(rownames(X)),]
head(X,20)
## SVL HL HLL FLL LAM TL
## ahli 4.091386 2.872331 3.983178 3.340385 2.834592 4.523518
## ahli 4.015600 2.836911 3.941395 3.415485 2.856380 4.542946
## ahli 3.993270 2.779235 3.886209 3.278822 2.878964 4.492043
## alayoni 3.795383 2.692120 3.243108 2.826904 3.107198 4.094175
## alayoni 3.815878 2.754703 3.301772 2.696563 3.081010 4.065341
## alayoni 3.798848 2.786575 3.281835 2.747086 3.061369 4.073011
## alayoni 3.845830 2.748305 3.330293 2.779768 3.034388 4.044423
## alfaroi 3.584145 2.432165 3.332006 2.576825 2.740419 4.447896
## alfaroi 3.476461 2.397698 3.390393 2.491981 2.770973 4.364756
## alfaroi 3.503768 2.355913 3.311736 2.379074 2.732485 4.373644
## alfaroi 3.565466 2.406537 3.328327 2.491937 2.707476 4.406833
## aliniger 4.064182 2.946228 3.648898 3.170323 3.207929 4.474463
## aliniger 4.002409 2.954163 3.603949 3.151589 3.147569 4.563649
## aliniger 4.028034 2.968776 3.661624 3.118830 3.125292 4.525245
## aliniger 4.020764 2.907979 3.667090 3.139346 3.140301 4.512028
## aliniger 3.955780 2.973505 3.586377 3.118829 3.148250 4.632081
## aliniger 4.022304 2.948944 3.653434 3.176354 3.183440 4.679663
## aliniger 4.059123 2.961300 3.582261 3.185081 3.166183 4.585893
## allisoni 4.378027 3.332326 3.920664 3.422667 3.221771 5.033425
## allisoni 4.325455 3.253286 3.877334 3.440805 3.249216 5.047095
Now, computing scores for these new data is about as simple as could be. We just call the generic method scores
, but set newdata
to our individual-level data matrix!
newdata_pc<-scores(anole_pca,newdata=X,dim=c(1,2))
head(newdata_pc,20)
## PC1 PC2
## ahli -0.10243429 0.071269183
## ahli -0.08762052 0.060907452
## ahli 0.05113888 0.052206256
## alayoni 0.80409917 0.282435151
## alayoni 0.82354904 0.275255830
## alayoni 0.80408016 0.281863729
## alayoni 0.78291847 0.304404727
## alfaroi 0.97349775 -0.331620406
## alfaroi 1.07546514 -0.315255859
## alfaroi 1.17233151 -0.370716761
## alfaroi 1.05806161 -0.346367848
## aliniger 0.05346551 0.222714309
## aliniger 0.07399053 0.112221392
## aliniger 0.06987509 0.132644583
## aliniger 0.08807446 0.137735356
## aliniger 0.07507745 0.041757879
## aliniger -0.02840616 0.039777413
## aliniger 0.02863586 0.127160622
## allisoni -0.73168217 0.009060608
## allisoni -0.68050056 -0.016986354
Tada!
For fun, let’s graph our phylomorphospace and overlay convex hulls from the within-species data.
cols<-setNames(RColorBrewer::brewer.pal(n=6,name="Accent"),
names(cols))
par(mar=c(5.1,4.1,0.6,0.6),cex.axis=0.8)
new_anole.data<-aggregate(X,by=list(rownames(X)),FUN=mean)
spp<-new_anole.data$Group.1
new_anole.data<-as.matrix(new_anole.data[,2:7])
rownames(new_anole.data)<-spp
phylomorphospace(anoletree,
scores(phyl.pca(anoletree,new_anole.data),dim=c(1,2)),
ftype="off",node.by.map=TRUE,bty="n",colors=cols,
node.size=c(0,0),las=1)
grid()
COLS<-cols[getStates(anoletree,"tips")]
for(i in 1:Ntip(anoletree)){
label<-anoletree$tip.label[i]
ii<-which(rownames(X)==label)
if(length(ii)>1){
hull<-chull(newdata_pc[ii,])
polygon(newdata_pc[ii,][hull,],col=COLS[i],border=FALSE)
}
else {
xy<-scores(anole_pca,dim=c(1,2))[label,]
points(x=xy[1],y=xy[2],pch=16,col=COLS[i])
}
}
Cool.
No comments:
Post a Comment
Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.