Sunday, April 30, 2023

One more post about phylogenetic PCA using contrasts (now in phytools)

In response to a request made by a colleague, over the past couple of days (1, 2) I have posted about how to run phylogenetic PCA (Revell 2009) using independent contrasts.

The main advantage of this is computational – computing time scales (more or less) linearly with the number of tips and characters in the tree, rather than as a function of the square of their product!

I’ve now added this method to the phytools phyl.pca function and it can be used by setting opt="REML".

Here’s a short demo illustrating that the same result is obtained by both methods; however, REML (restricted maximum likelihood, using contrasts) is much faster!

## load phytools
library(phytools)
packageVersion("phytools")
## [1] '1.8.3'
## simulate random tree
tree<-pbtree(n=2000)
## simulate random covariance matrix with 5 traits
vcv<-clusterGeneration::genPositiveDefMat(5,
  "unifcorrmat")$Sigma
vcv
##           [,1]       [,2]       [,3]       [,4]       [,5]
## [1,]  6.760533 -3.3495574  1.2445179 -1.3343917  3.1449044
## [2,] -3.349557  3.3724537 -0.7844665 -0.6422442 -1.5204610
## [3,]  1.244518 -0.7844665  6.3012759 -1.2517792 -0.1997295
## [4,] -1.334392 -0.6422442 -1.2517792  7.4835345 -0.4380401
## [5,]  3.144904 -1.5204610 -0.1997295 -0.4380401  7.9871238
## simulate correlated Brownian motion evolution on tree
X<-sim.corrs(tree,vcv)
head(X)
##             [,1]      [,2]      [,3]       [,4]       [,5]
## t1582 -11.657108 11.286925  5.002298 -4.2415780 -3.4336035
## t1583  -9.420194 11.286074  4.318832 -5.2646588 -1.2929148
## t1049  -6.670022  2.950044 -3.658749  0.6068788  2.7321396
## t1050  -7.793054  4.308016 -2.846464  1.3407722  2.4834178
## t87    -8.897969  7.695878 -5.457121  3.1196218 -0.3617004
## t481  -10.132127  5.399918 -4.016399  0.6995367  5.8017801
system.time(ppca_ml<-phyl.pca(tree,X,opt="ML"))
##    user  system elapsed 
##    3.80    0.00    9.43
ppca_ml
## Phylogenetic pca
## Standard deviations:
##      PC1      PC2      PC3      PC4      PC5 
## 3.557452 2.842953 2.495626 2.054320 1.048796 
## Loads:
##             PC1         PC2        PC3         PC4         PC5
## [1,]  0.8907404  0.01388548 -0.1277305  0.37641298  0.21997047
## [2,] -0.6909064 -0.23328601  0.3511870 -0.37774591  0.44966843
## [3,]  0.3295439 -0.44459222 -0.7053649 -0.44275359  0.01296906
## [4,] -0.2895291  0.85085837 -0.4302646 -0.04555237  0.07078378
## [5,]  0.7223363  0.37823215  0.3820813 -0.43425661 -0.02461367
system.time(ppca_reml<-phyl.pca(tree,X,opt="REML"))
##    user  system elapsed 
##    0.02    0.00    0.17
ppca_reml
## Phylogenetic pca
## Standard deviations:
##      PC1      PC2      PC3      PC4      PC5 
## 3.557452 2.842953 2.495626 2.054320 1.048796 
## Loads:
##             PC1         PC2        PC3         PC4         PC5
## [1,]  0.8907404  0.01388548 -0.1277305  0.37641298 -0.21997047
## [2,] -0.6909064 -0.23328601  0.3511870 -0.37774591 -0.44966843
## [3,]  0.3295439 -0.44459222 -0.7053649 -0.44275359 -0.01296906
## [4,] -0.2895291  0.85085837 -0.4302646 -0.04555237 -0.07078378
## [5,]  0.7223363  0.37823215  0.3820813 -0.43425661  0.02461367
## lambda:
## [1] 1

The difference becomes even more exaggerated when we try to estimate Pagel’s λ – I’ll show this, but I won’t even attempt to do it with a 2,000 taxon phylogeny! Here, I simulate a 500 taxon phylogeny instead.

## simulate random tree
tree<-pbtree(n=500)
## simulate random covariance matrix with 5 traits
vcv<-clusterGeneration::genPositiveDefMat(5,
  "unifcorrmat")$Sigma
## simulate correlated evolution with lambda = 0.8
X<-sim.corrs(phytools:::lambdaTree(tree,lambda=0.8),
  vcv)

Just to illustrate that it can be done, let’s extract our components from the evolutionary correlation, rather than covariance, matrix.

system.time(ppca_ml<-phyl.pca(tree,X,opt="ML",
  method="lambda",mode="corr"))
##    user  system elapsed 
##   17.00    0.13   62.59
ppca_ml
## Phylogenetic pca
## Standard deviations:
##       PC1       PC2       PC3       PC4       PC5 
## 1.5177517 1.1314072 1.0345800 0.5460447 0.2186937 
## Loads:
##               PC1       PC2         PC3         PC4         PC5
## [1,]  0.004048179 0.5509678  0.82666960 -0.09187201  0.06778677
## [2,] -0.866987173 0.2818751 -0.13173788  0.38282677  0.07048717
## [3,] -0.914124937 0.3435776  0.02329003 -0.15335228 -0.14923376
## [4,]  0.216445299 0.7376206 -0.60601697 -0.19429407  0.06372188
## [5,]  0.818177267 0.4846994  0.04265330  0.28618335 -0.10923494
## lambda:
## [1] 0.8072539
system.time(ppca_reml<-phyl.pca(tree,X,opt="REML",
  method="lambda",mode="corr"))
##    user  system elapsed 
##    0.09    0.00    0.51
ppca_reml
## Phylogenetic pca
## Standard deviations:
##       PC1       PC2       PC3       PC4       PC5 
## 1.5176528 1.1315916 1.0344657 0.5461465 0.2187122 
## Loads:
##               PC1       PC2         PC3         PC4         PC5
## [1,]  0.003738619 0.5511287  0.82656819 -0.09183023  0.06778957
## [2,] -0.866993887 0.2816814 -0.13181325  0.38292998  0.07047754
## [3,] -0.914154054 0.3434866  0.02321828 -0.15337890 -0.14924855
## [4,]  0.216154604 0.7377489 -0.60595160 -0.19432734  0.06374407
## [5,]  0.818032542 0.4849287  0.04258120  0.28621609 -0.10924331
## lambda:
## [1] 0.8094609

That’s pretty striking. (As noted previously, REML & ML give us slightly different estimates of λ. This is not unexpected & in a subsequent post I’ll show that the REML estimates are unbiased.)

Lastly, let’s do this using an empirical dataset: values for various morphological traits in Anolis lizards.

data(anoletree)
data(anole.data)
anole.pca_reml<-phyl.pca(anoletree,anole.data,opt="REML")
anole.pca_reml
## Phylogenetic pca
## Standard deviations:
##        PC1        PC2        PC3        PC4        PC5        PC6 
## 0.33222579 0.09207406 0.05012082 0.04318123 0.02008655 0.01507125 
## Loads:
##            PC1         PC2         PC3         PC4         PC5         PC6
## SVL -0.9712234  0.16067288  0.01972570  0.14782215  0.06211906  0.06935433
## HL  -0.9645111  0.16955087 -0.01203113  0.17994634 -0.08064324 -0.04406887
## HLL -0.9814164 -0.02674808  0.10315533 -0.13790763 -0.06887922  0.04126248
## FLL -0.9712265  0.17585694  0.10697935 -0.09105747  0.06075142 -0.04864769
## LAM -0.7810052  0.37429334 -0.47398703 -0.15871456 -0.00217418  0.00875408
## TL  -0.9014509 -0.42528918 -0.07614571  0.01709649  0.01750404 -0.01088743
## lambda:
## [1] 1

PC1 is negative size – so let’s flip its sign & plot it on our tree. We’ll use the trick I posted the day before yesterday to create an arc style plot. (Minimally, so I don’t get ragged on when I tweet about this post.)

anoletree<-reorder(as.phylo(anoletree))
pc1<-setNames(-scores(anole.pca_reml)[,1],
  rownames(scores(anole.pca_reml)))
aa<-fastAnc(anoletree,pc1)
aa<-setNames(c(aa[1],aa),1:(length(aa)+1)+Ntip(anoletree))
anoletree$root.edge<-2*max(nodeHeights(anoletree))
anoletree<-rootedge.to.singleton(anoletree)
obj<-contMap(anoletree,pc1,method="user",anc.states=aa,
  plot=FALSE)
obj<-setMap(obj,viridisLite::viridis(n=10,direction=-1))
obj$tree<-paintBranches(obj$tree,edge=obj$tree$edge[1,2],
  state="0")
cols<-setNames(c("transparent",obj$cols),
  c("0",names(obj$cols)))
plot(obj$tree,colors=cols,lwd=2,type="fan",part=0.5,ftype="i",
  fsize=0.6)
add.color.bar(max(nodeHeights(anoletree)),obj$cols,subtitle="",
  title="PC1 from phylogenetic PCA (body size)",lims=round(obj$lims,2),
  x=-0.5*max(nodeHeights(anoletree)),y=0,prompt=FALSE,lwd=6,
  outline=FALSE)