Phylogenetic Tools for Comparative Biology: Minimal function for phylogenetic PCA using contrasts

This is working towards something else, but I thought I’d post a minimal function to do phylogenetic PCA (following Revell 2009) on the variance-covariance matrix using phylogenetically independent contrasts.

First, here’s the function.

phyl_pca<-function(tree,X,model="BM"){
  pX<-apply(X,2,pic,phy=tree)
  vcv<-t(pX)%*%pX/(Ntip(tree)-1)
  a<-apply(X,2,function(x,tree) rep(ace(x,tree,
    method="pic")$ace[1],Ntip(tree)),tree=tree)
  eig<-eigen(vcv)
  Eval<-diag(eig$values)
  colnames(Eval)<-rownames(Eval)<-paste("PC",1:nrow(Eval),sep="")
  Evec<-eig$vectors
  rownames(Evec)<-colnames(X)
  colnames(Evec)<-colnames(Eval)
  S<-(X-a)%*%Evec
  pic_corr<-function(x,y,tree){ 
    px<-pic(x,tree)
    py<-pic(y,tree)
    mean(px*py)/sqrt(mean(px^2)*mean(py^2))
  }
  L<-apply(S,2,function(x,y,tree) apply(y,2,pic_corr,
    x=x,tree=tree),y=X,tree=tree)
  dimnames(L)<-dimnames(Evec)
  object<-list(Eval=Eval,Evec=Evec,S=S,L=L,V=vcv,
    a=a[1,,drop=FALSE],mode="cov",call=match.call())
  class(object)<-"phyl.pca"
  object
}

Now, let’s use simulated data to establish that it gives the same result as phyl.pca in phytools.

library(phytools)
tree<-pbtree(n=26,tip.label=LETTERS)
X<-fastBM(tree,nsim=5)
ppca_old<-phyl.pca(tree,X)
ppca_old

## Phylogenetic pca
## Standard deviations:
##     PC1     PC2     PC3     PC4     PC5 
## 1.30539 1.14825 0.81916 0.69441 0.54724 
## Loads:
##            PC1       PC2      PC3       PC4      PC5
## [1,] -0.447049 -0.647115 -0.13345  0.517316 -0.30978
## [2,] -0.188977 -0.422767  0.86683  0.081185  0.16607
## [3,] -0.339527  0.870107  0.24518  0.177473 -0.18981
## [4,] -0.024847 -0.337630  0.25756 -0.684094 -0.59251
## [5,]  0.972326  0.030731  0.11605  0.168092 -0.10916

ppca_new<-phyl_pca(tree,X)
ppca_new

## Phylogenetic pca
## Standard deviations:
##     PC1     PC2     PC3     PC4     PC5 
## 1.30539 1.14825 0.81916 0.69441 0.54724 
## Loads:
##            PC1       PC2      PC3       PC4      PC5
## [1,] -0.447049 -0.647115 -0.13345  0.517316 -0.30978
## [2,] -0.188977 -0.422767  0.86683  0.081185  0.16607
## [3,] -0.339527  0.870107  0.24518  0.177473 -0.18981
## [4,] -0.024847 -0.337630  0.25756 -0.684094 -0.59251
## [5,]  0.972326  0.030731  0.11605  0.168092 -0.10916

Straight away, we see that we’re getting the same loadings (evoluationary correlations between our five PCs and the original traits). If we inspected the scores

head(scores(ppca_old))

##        PC1     PC2       PC3       PC4      PC5
## A -0.94060 0.64396  1.248968  0.864546  0.62631
## B -0.60360 1.64493  0.917616  0.851302 -0.15091
## C -1.70219 0.91533  1.900368 -0.162792  0.29883
## D -0.78124 1.51660 -0.560986  0.011550 -1.10221
## E -1.18014 2.19275 -0.055891 -0.020694 -0.71089
## F -1.89954 3.63193 -0.560573 -0.059178  0.51366

head(scores(ppca_new))

##        PC1     PC2       PC3       PC4      PC5
## A -0.94060 0.64396  1.248968  0.864546  0.62631
## B -0.60360 1.64493  0.917616  0.851302 -0.15091
## C -1.70219 0.91533  1.900368 -0.162792  0.29883
## D -0.78124 1.51660 -0.560986  0.011550 -1.10221
## E -1.18014 2.19275 -0.055891 -0.020694 -0.71089
## F -1.89954 3.63193 -0.560573 -0.059178  0.51366

We see the same thing.

Our plotting methods work too.

par(mfrow=c(1,2))
plot(ppca_new)
biplot(ppca_new)

plot of chunk unnamed-chunk-6

So what’s the advantage of using contrasts – if we just get the same result? Well, for one contrasts are much faster.

This becomes evident when we simulate a much larger tree & more characters.

tree<-pbtree(n=2000)
X<-fastBM(tree,nsim=10)
system.time(ppca1<-phyl.pca(tree,X))

##    user  system elapsed 
##    1.92    0.03    5.25

system.time(ppca2<-phyl_pca(tree,X))

##    user  system elapsed 
##    0.05    0.00    0.09

Are the results the same? (I’ll change options()$digits here just so that we can see all our loadings from each analysis in one blog column. I promise I’m not hiding any differences!)

options(digits=2)
ppca1

## Phylogenetic pca
## Standard deviations:
##  PC1  PC2  PC3  PC4  PC5  PC6  PC7  PC8  PC9 PC10 
## 1.05 1.03 1.01 1.00 0.98 0.98 0.97 0.96 0.96 0.94 
## Loads:
##         PC1   PC2    PC3      PC4    PC5    PC6    PC7    PC8   PC9   PC10
##  [1,]  0.42  0.03  0.146  0.35447  0.492  0.167  0.208  0.048 -0.25 -0.547
##  [2,]  0.27  0.37  0.409 -0.37599 -0.284  0.495  0.374 -0.093 -0.02  0.102
##  [3,] -0.17  0.63 -0.016 -0.00034 -0.085  0.074 -0.342  0.391  0.38 -0.389
##  [4,] -0.23  0.53  0.033 -0.30188  0.422 -0.286 -0.026  0.040 -0.52  0.212
##  [5,] -0.32  0.20 -0.246  0.17329  0.482  0.272  0.309 -0.330  0.46  0.208
##  [6,]  0.18 -0.10  0.076 -0.53549  0.220 -0.061 -0.407 -0.570  0.18 -0.293
##  [7,] -0.20 -0.25 -0.199 -0.52941  0.059 -0.223  0.549  0.303  0.14 -0.334
##  [8,] -0.31 -0.36 -0.062 -0.22299  0.226  0.669 -0.298  0.295 -0.21  0.038
##  [9,]  0.70  0.08 -0.525 -0.17564  0.150  0.034 -0.081  0.263  0.12  0.284
## [10,]  0.11 -0.20  0.676 -0.02729  0.358 -0.196 -0.060  0.330  0.37  0.280

ppca2

## Phylogenetic pca
## Standard deviations:
##  PC1  PC2  PC3  PC4  PC5  PC6  PC7  PC8  PC9 PC10 
## 1.05 1.03 1.01 1.00 0.98 0.98 0.97 0.96 0.96 0.94 
## Loads:
##         PC1   PC2    PC3      PC4    PC5    PC6    PC7    PC8   PC9   PC10
##  [1,] -0.42  0.03 -0.146  0.35447  0.492  0.167 -0.208  0.048 -0.25  0.547
##  [2,] -0.27  0.37 -0.409 -0.37599 -0.284  0.495 -0.374 -0.093 -0.02 -0.102
##  [3,]  0.17  0.63  0.016 -0.00034 -0.085  0.074  0.342  0.391  0.38  0.389
##  [4,]  0.23  0.53 -0.033 -0.30188  0.422 -0.286  0.026  0.040 -0.52 -0.212
##  [5,]  0.32  0.20  0.246  0.17329  0.482  0.272 -0.309 -0.330  0.46 -0.208
##  [6,] -0.18 -0.10 -0.076 -0.53549  0.220 -0.061  0.407 -0.570  0.18  0.293
##  [7,]  0.20 -0.25  0.199 -0.52941  0.059 -0.223 -0.549  0.303  0.14  0.334
##  [8,]  0.31 -0.36  0.062 -0.22299  0.226  0.669  0.298  0.295 -0.21 -0.038
##  [9,] -0.70  0.08  0.525 -0.17564  0.150  0.034  0.081  0.263  0.12 -0.284
## [10,] -0.11 -0.20 -0.676 -0.02729  0.358 -0.196  0.060  0.330  0.37 -0.280

Neat. How big can we go?

tree<-pbtree(n=10000)
X<-fastBM(tree,nsim=65)

system.time(ppca<-phyl_pca(tree,X))

##    user  system elapsed 
##   10.28    0.16   20.19

biplot(ppca)

plot of chunk unnamed-chunk-13

It works! More later….

1 comment:

ddifraiaApril 24, 2024 at 12:04 PM
Very interesting! Just a fast question, does similar approaches also apply for a phylogenetic Partial Least Squares regression ? Can I simply perform a normal Partial Least Squares on the matrix of PICs ?
Thanks a lot,
Domenico

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.

Phylogenetic Tools for Comparative Biology

Pages

Friday, April 28, 2023

Minimal function for phylogenetic PCA using contrasts

1 comment: