Friday, April 28, 2023

Minimal function for phylogenetic PCA using contrasts

This is working towards something else, but I thought I’d post a minimal function to do phylogenetic PCA (following Revell 2009) on the variance-covariance matrix using phylogenetically independent contrasts.

First, here’s the function.

phyl_pca<-function(tree,X,model="BM"){
  pX<-apply(X,2,pic,phy=tree)
  vcv<-t(pX)%*%pX/(Ntip(tree)-1)
  a<-apply(X,2,function(x,tree) rep(ace(x,tree,
    method="pic")$ace[1],Ntip(tree)),tree=tree)
  eig<-eigen(vcv)
  Eval<-diag(eig$values)
  colnames(Eval)<-rownames(Eval)<-paste("PC",1:nrow(Eval),sep="")
  Evec<-eig$vectors
  rownames(Evec)<-colnames(X)
  colnames(Evec)<-colnames(Eval)
  S<-(X-a)%*%Evec
  pic_corr<-function(x,y,tree){ 
    px<-pic(x,tree)
    py<-pic(y,tree)
    mean(px*py)/sqrt(mean(px^2)*mean(py^2))
  }
  L<-apply(S,2,function(x,y,tree) apply(y,2,pic_corr,
    x=x,tree=tree),y=X,tree=tree)
  dimnames(L)<-dimnames(Evec)
  object<-list(Eval=Eval,Evec=Evec,S=S,L=L,V=vcv,
    a=a[1,,drop=FALSE],mode="cov",call=match.call())
  class(object)<-"phyl.pca"
  object
}

Now, let’s use simulated data to establish that it gives the same result as phyl.pca in phytools.

library(phytools)
tree<-pbtree(n=26,tip.label=LETTERS)
X<-fastBM(tree,nsim=5)
ppca_old<-phyl.pca(tree,X)
ppca_old
## Phylogenetic pca
## Standard deviations:
##     PC1     PC2     PC3     PC4     PC5 
## 1.30539 1.14825 0.81916 0.69441 0.54724 
## Loads:
##            PC1       PC2      PC3       PC4      PC5
## [1,] -0.447049 -0.647115 -0.13345  0.517316 -0.30978
## [2,] -0.188977 -0.422767  0.86683  0.081185  0.16607
## [3,] -0.339527  0.870107  0.24518  0.177473 -0.18981
## [4,] -0.024847 -0.337630  0.25756 -0.684094 -0.59251
## [5,]  0.972326  0.030731  0.11605  0.168092 -0.10916
ppca_new<-phyl_pca(tree,X)
ppca_new
## Phylogenetic pca
## Standard deviations:
##     PC1     PC2     PC3     PC4     PC5 
## 1.30539 1.14825 0.81916 0.69441 0.54724 
## Loads:
##            PC1       PC2      PC3       PC4      PC5
## [1,] -0.447049 -0.647115 -0.13345  0.517316 -0.30978
## [2,] -0.188977 -0.422767  0.86683  0.081185  0.16607
## [3,] -0.339527  0.870107  0.24518  0.177473 -0.18981
## [4,] -0.024847 -0.337630  0.25756 -0.684094 -0.59251
## [5,]  0.972326  0.030731  0.11605  0.168092 -0.10916

Straight away, we see that we’re getting the same loadings (evoluationary correlations between our five PCs and the original traits). If we inspected the scores

head(scores(ppca_old))
##        PC1     PC2       PC3       PC4      PC5
## A -0.94060 0.64396  1.248968  0.864546  0.62631
## B -0.60360 1.64493  0.917616  0.851302 -0.15091
## C -1.70219 0.91533  1.900368 -0.162792  0.29883
## D -0.78124 1.51660 -0.560986  0.011550 -1.10221
## E -1.18014 2.19275 -0.055891 -0.020694 -0.71089
## F -1.89954 3.63193 -0.560573 -0.059178  0.51366
head(scores(ppca_new))
##        PC1     PC2       PC3       PC4      PC5
## A -0.94060 0.64396  1.248968  0.864546  0.62631
## B -0.60360 1.64493  0.917616  0.851302 -0.15091
## C -1.70219 0.91533  1.900368 -0.162792  0.29883
## D -0.78124 1.51660 -0.560986  0.011550 -1.10221
## E -1.18014 2.19275 -0.055891 -0.020694 -0.71089
## F -1.89954 3.63193 -0.560573 -0.059178  0.51366

We see the same thing.

Our plotting methods work too.

par(mfrow=c(1,2))
plot(ppca_new)
biplot(ppca_new)

plot of chunk unnamed-chunk-6

So what’s the advantage of using contrasts – if we just get the same result? Well, for one contrasts are much faster.

This becomes evident when we simulate a much larger tree & more characters.

tree<-pbtree(n=2000)
X<-fastBM(tree,nsim=10)
system.time(ppca1<-phyl.pca(tree,X))
##    user  system elapsed 
##    1.92    0.03    5.25
system.time(ppca2<-phyl_pca(tree,X))
##    user  system elapsed 
##    0.05    0.00    0.09

Are the results the same? (I’ll change options()$digits here just so that we can see all our loadings from each analysis in one blog column. I promise I’m not hiding any differences!)

options(digits=2)
ppca1
## Phylogenetic pca
## Standard deviations:
##  PC1  PC2  PC3  PC4  PC5  PC6  PC7  PC8  PC9 PC10 
## 1.05 1.03 1.01 1.00 0.98 0.98 0.97 0.96 0.96 0.94 
## Loads:
##         PC1   PC2    PC3      PC4    PC5    PC6    PC7    PC8   PC9   PC10
##  [1,]  0.42  0.03  0.146  0.35447  0.492  0.167  0.208  0.048 -0.25 -0.547
##  [2,]  0.27  0.37  0.409 -0.37599 -0.284  0.495  0.374 -0.093 -0.02  0.102
##  [3,] -0.17  0.63 -0.016 -0.00034 -0.085  0.074 -0.342  0.391  0.38 -0.389
##  [4,] -0.23  0.53  0.033 -0.30188  0.422 -0.286 -0.026  0.040 -0.52  0.212
##  [5,] -0.32  0.20 -0.246  0.17329  0.482  0.272  0.309 -0.330  0.46  0.208
##  [6,]  0.18 -0.10  0.076 -0.53549  0.220 -0.061 -0.407 -0.570  0.18 -0.293
##  [7,] -0.20 -0.25 -0.199 -0.52941  0.059 -0.223  0.549  0.303  0.14 -0.334
##  [8,] -0.31 -0.36 -0.062 -0.22299  0.226  0.669 -0.298  0.295 -0.21  0.038
##  [9,]  0.70  0.08 -0.525 -0.17564  0.150  0.034 -0.081  0.263  0.12  0.284
## [10,]  0.11 -0.20  0.676 -0.02729  0.358 -0.196 -0.060  0.330  0.37  0.280
ppca2
## Phylogenetic pca
## Standard deviations:
##  PC1  PC2  PC3  PC4  PC5  PC6  PC7  PC8  PC9 PC10 
## 1.05 1.03 1.01 1.00 0.98 0.98 0.97 0.96 0.96 0.94 
## Loads:
##         PC1   PC2    PC3      PC4    PC5    PC6    PC7    PC8   PC9   PC10
##  [1,] -0.42  0.03 -0.146  0.35447  0.492  0.167 -0.208  0.048 -0.25  0.547
##  [2,] -0.27  0.37 -0.409 -0.37599 -0.284  0.495 -0.374 -0.093 -0.02 -0.102
##  [3,]  0.17  0.63  0.016 -0.00034 -0.085  0.074  0.342  0.391  0.38  0.389
##  [4,]  0.23  0.53 -0.033 -0.30188  0.422 -0.286  0.026  0.040 -0.52 -0.212
##  [5,]  0.32  0.20  0.246  0.17329  0.482  0.272 -0.309 -0.330  0.46 -0.208
##  [6,] -0.18 -0.10 -0.076 -0.53549  0.220 -0.061  0.407 -0.570  0.18  0.293
##  [7,]  0.20 -0.25  0.199 -0.52941  0.059 -0.223 -0.549  0.303  0.14  0.334
##  [8,]  0.31 -0.36  0.062 -0.22299  0.226  0.669  0.298  0.295 -0.21 -0.038
##  [9,] -0.70  0.08  0.525 -0.17564  0.150  0.034  0.081  0.263  0.12 -0.284
## [10,] -0.11 -0.20 -0.676 -0.02729  0.358 -0.196  0.060  0.330  0.37 -0.280

Neat. How big can we go?

tree<-pbtree(n=10000)
X<-fastBM(tree,nsim=65)
system.time(ppca<-phyl_pca(tree,X))
##    user  system elapsed 
##   10.28    0.16   20.19
biplot(ppca)

plot of chunk unnamed-chunk-13

It works! More later….

1 comment:

  1. Very interesting! Just a fast question, does similar approaches also apply for a phylogenetic Partial Least Squares regression ? Can I simply perform a normal Partial Least Squares on the matrix of PICs ?
    Thanks a lot,
    Domenico

    ReplyDelete

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.