Sunday, May 21, 2023

Applying a phyl.pca rotation to a new dataset

This is a demo to illustrate how to compute principal components scores from phylogenetic PCA on data separate from those used in the PCA itself.

A circumstance in which we might want to compute scores like this would be if we had undertaken phylogenetic PCA with species means, but then wanted to compute PC scores, using the phyl.pca rotation, on all our individual observations.

Since all we’re doing is applying a rotation obtained from one analysis to a different set of data, however, it’s not restricted to that use case. For example, we might have conducted interspecies phylogenetic PCA using species means of wild Canidae – but then desire to apply that rotation to the same variables measured for a set of domestic dog breeds.

This demo will mostly work using the current CRAN version of phytools, but I added one very small update to the scores method, so it is best to update phytools from GitHub if following along at home.

library(phytools)
packageVersion("phytools")
## [1] '1.9.3'

For this example, I will use real interspecific species means data for Anolis lizards from Mahler et al. (2010).

data(anoletree)
data(anole.data)
anole_pca<-phyl.pca(anoletree,anole.data)
anole_pca
## Phylogenetic pca
## Standard deviations:
##        PC1        PC2        PC3        PC4        PC5        PC6 
## 0.33222579 0.09207406 0.05012082 0.04318123 0.02008655 0.01507125 
## Loads:
##            PC1         PC2         PC3         PC4         PC5         PC6
## SVL -0.9712234  0.16067288  0.01972570  0.14782215 -0.06211906  0.06935433
## HL  -0.9645111  0.16955087 -0.01203113  0.17994634  0.08064324 -0.04406887
## HLL -0.9814164 -0.02674808  0.10315533 -0.13790763  0.06887922  0.04126248
## FLL -0.9712265  0.17585694  0.10697935 -0.09105747 -0.06075142 -0.04864769
## LAM -0.7810052  0.37429334 -0.47398703 -0.15871456  0.00217418  0.00875408
## TL  -0.9014509 -0.42528918 -0.07614571  0.01709649 -0.01750404 -0.01088743

Here’s a phylomorphospace projection of just the two PC axes from this analysis.

cols<-setNames(palette()[6:1],sort(unique(getStates(anoletree))))
par(mar=c(5.1,4.1,0.6,0.6))
phylomorphospace(anoletree,scores(anole_pca,dim=c(1,2)),
  ftype="off",node.by.map=TRUE,bty="n",colors=cols,
  node.size=c(0,1.2))
grid()
legend("topleft",names(cols),pch=21,pt.bg=cols,horiz=TRUE,
  bty="n",pt.cex=1.5)

plot of chunk unnamed-chunk-3

Unfortunately, I don’t have the individual-level data for this study, so to approximate what those might look like, I intend to simulate them using the phytools function sampleFrom. I’ll simulate 600 observations in total, with varied sample size among species.

nsamples<-600
xvar<-0.01*apply(anole.data,2,var)
X<-matrix(NA,nsamples,ncol(anole.data))
spp<-vector()
for(i in 1:nsamples){
  spp[i]<-sample(rownames(anole.data),1)
  xbar<-t(anole.data)[,spp[i]]
  X[i,]<-sampleFrom(xbar,xvar,n=rep(1,length(xbar)))
}
dimnames(X)<-list(spp,names(xbar))
X<-X[order(rownames(X)),]
head(X,20)
##               SVL       HL      HLL      FLL      LAM       TL
## ahli     4.091386 2.872331 3.983178 3.340385 2.834592 4.523518
## ahli     4.015600 2.836911 3.941395 3.415485 2.856380 4.542946
## ahli     3.993270 2.779235 3.886209 3.278822 2.878964 4.492043
## alayoni  3.795383 2.692120 3.243108 2.826904 3.107198 4.094175
## alayoni  3.815878 2.754703 3.301772 2.696563 3.081010 4.065341
## alayoni  3.798848 2.786575 3.281835 2.747086 3.061369 4.073011
## alayoni  3.845830 2.748305 3.330293 2.779768 3.034388 4.044423
## alfaroi  3.584145 2.432165 3.332006 2.576825 2.740419 4.447896
## alfaroi  3.476461 2.397698 3.390393 2.491981 2.770973 4.364756
## alfaroi  3.503768 2.355913 3.311736 2.379074 2.732485 4.373644
## alfaroi  3.565466 2.406537 3.328327 2.491937 2.707476 4.406833
## aliniger 4.064182 2.946228 3.648898 3.170323 3.207929 4.474463
## aliniger 4.002409 2.954163 3.603949 3.151589 3.147569 4.563649
## aliniger 4.028034 2.968776 3.661624 3.118830 3.125292 4.525245
## aliniger 4.020764 2.907979 3.667090 3.139346 3.140301 4.512028
## aliniger 3.955780 2.973505 3.586377 3.118829 3.148250 4.632081
## aliniger 4.022304 2.948944 3.653434 3.176354 3.183440 4.679663
## aliniger 4.059123 2.961300 3.582261 3.185081 3.166183 4.585893
## allisoni 4.378027 3.332326 3.920664 3.422667 3.221771 5.033425
## allisoni 4.325455 3.253286 3.877334 3.440805 3.249216 5.047095

Now, computing scores for these new data is about as simple as could be. We just call the generic method scores, but set newdata to our individual-level data matrix!

newdata_pc<-scores(anole_pca,newdata=X,dim=c(1,2))
head(newdata_pc,20)
##                  PC1          PC2
## ahli     -0.10243429  0.071269183
## ahli     -0.08762052  0.060907452
## ahli      0.05113888  0.052206256
## alayoni   0.80409917  0.282435151
## alayoni   0.82354904  0.275255830
## alayoni   0.80408016  0.281863729
## alayoni   0.78291847  0.304404727
## alfaroi   0.97349775 -0.331620406
## alfaroi   1.07546514 -0.315255859
## alfaroi   1.17233151 -0.370716761
## alfaroi   1.05806161 -0.346367848
## aliniger  0.05346551  0.222714309
## aliniger  0.07399053  0.112221392
## aliniger  0.06987509  0.132644583
## aliniger  0.08807446  0.137735356
## aliniger  0.07507745  0.041757879
## aliniger -0.02840616  0.039777413
## aliniger  0.02863586  0.127160622
## allisoni -0.73168217  0.009060608
## allisoni -0.68050056 -0.016986354

Tada!

For fun, let’s graph our phylomorphospace and overlay convex hulls from the within-species data.

cols<-setNames(RColorBrewer::brewer.pal(n=6,name="Accent"),
  names(cols))
par(mar=c(5.1,4.1,0.6,0.6),cex.axis=0.8)
new_anole.data<-aggregate(X,by=list(rownames(X)),FUN=mean)
spp<-new_anole.data$Group.1
new_anole.data<-as.matrix(new_anole.data[,2:7])
rownames(new_anole.data)<-spp
phylomorphospace(anoletree,
  scores(phyl.pca(anoletree,new_anole.data),dim=c(1,2)),
  ftype="off",node.by.map=TRUE,bty="n",colors=cols,
  node.size=c(0,0),las=1)
grid()
COLS<-cols[getStates(anoletree,"tips")]
for(i in 1:Ntip(anoletree)){
  label<-anoletree$tip.label[i]
  ii<-which(rownames(X)==label)
  if(length(ii)>1){
    hull<-chull(newdata_pc[ii,])
    polygon(newdata_pc[ii,][hull,],col=COLS[i],border=FALSE)
  }
  else {
    xy<-scores(anole_pca,dim=c(1,2))[label,]
    points(x=xy[1],y=xy[2],pch=16,col=COLS[i])
  }
}

plot of chunk unnamed-chunk-6

Cool.

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.