In response to a request made by a colleague, over the past couple of days (1, 2) I have posted about how to run phylogenetic PCA (Revell 2009) using independent contrasts.
The main advantage of this is computational – computing time scales (more or less) linearly with the number of tips and characters in the tree, rather than as a function of the square of their product!
I’ve now added this method to the phytools phyl.pca
function and it can be used by setting opt="REML"
.
Here’s a short demo illustrating that the same result is obtained by both methods; however, REML (restricted maximum likelihood, using contrasts) is much faster!
## load phytools
library(phytools)
packageVersion("phytools")
## [1] '1.8.3'
## simulate random tree
tree<-pbtree(n=2000)
## simulate random covariance matrix with 5 traits
vcv<-clusterGeneration::genPositiveDefMat(5,
"unifcorrmat")$Sigma
vcv
## [,1] [,2] [,3] [,4] [,5]
## [1,] 6.760533 -3.3495574 1.2445179 -1.3343917 3.1449044
## [2,] -3.349557 3.3724537 -0.7844665 -0.6422442 -1.5204610
## [3,] 1.244518 -0.7844665 6.3012759 -1.2517792 -0.1997295
## [4,] -1.334392 -0.6422442 -1.2517792 7.4835345 -0.4380401
## [5,] 3.144904 -1.5204610 -0.1997295 -0.4380401 7.9871238
## simulate correlated Brownian motion evolution on tree
X<-sim.corrs(tree,vcv)
head(X)
## [,1] [,2] [,3] [,4] [,5]
## t1582 -11.657108 11.286925 5.002298 -4.2415780 -3.4336035
## t1583 -9.420194 11.286074 4.318832 -5.2646588 -1.2929148
## t1049 -6.670022 2.950044 -3.658749 0.6068788 2.7321396
## t1050 -7.793054 4.308016 -2.846464 1.3407722 2.4834178
## t87 -8.897969 7.695878 -5.457121 3.1196218 -0.3617004
## t481 -10.132127 5.399918 -4.016399 0.6995367 5.8017801
system.time(ppca_ml<-phyl.pca(tree,X,opt="ML"))
## user system elapsed
## 3.80 0.00 9.43
ppca_ml
## Phylogenetic pca
## Standard deviations:
## PC1 PC2 PC3 PC4 PC5
## 3.557452 2.842953 2.495626 2.054320 1.048796
## Loads:
## PC1 PC2 PC3 PC4 PC5
## [1,] 0.8907404 0.01388548 -0.1277305 0.37641298 0.21997047
## [2,] -0.6909064 -0.23328601 0.3511870 -0.37774591 0.44966843
## [3,] 0.3295439 -0.44459222 -0.7053649 -0.44275359 0.01296906
## [4,] -0.2895291 0.85085837 -0.4302646 -0.04555237 0.07078378
## [5,] 0.7223363 0.37823215 0.3820813 -0.43425661 -0.02461367
system.time(ppca_reml<-phyl.pca(tree,X,opt="REML"))
## user system elapsed
## 0.02 0.00 0.17
ppca_reml
## Phylogenetic pca
## Standard deviations:
## PC1 PC2 PC3 PC4 PC5
## 3.557452 2.842953 2.495626 2.054320 1.048796
## Loads:
## PC1 PC2 PC3 PC4 PC5
## [1,] 0.8907404 0.01388548 -0.1277305 0.37641298 -0.21997047
## [2,] -0.6909064 -0.23328601 0.3511870 -0.37774591 -0.44966843
## [3,] 0.3295439 -0.44459222 -0.7053649 -0.44275359 -0.01296906
## [4,] -0.2895291 0.85085837 -0.4302646 -0.04555237 -0.07078378
## [5,] 0.7223363 0.37823215 0.3820813 -0.43425661 0.02461367
## lambda:
## [1] 1
The difference becomes even more exaggerated when we try to estimate Pagel’s λ – I’ll show this, but I won’t even attempt to do it with a 2,000 taxon phylogeny! Here, I simulate a 500 taxon phylogeny instead.
## simulate random tree
tree<-pbtree(n=500)
## simulate random covariance matrix with 5 traits
vcv<-clusterGeneration::genPositiveDefMat(5,
"unifcorrmat")$Sigma
## simulate correlated evolution with lambda = 0.8
X<-sim.corrs(phytools:::lambdaTree(tree,lambda=0.8),
vcv)
Just to illustrate that it can be done, let’s extract our components from the evolutionary correlation, rather than covariance, matrix.
system.time(ppca_ml<-phyl.pca(tree,X,opt="ML",
method="lambda",mode="corr"))
## user system elapsed
## 17.00 0.13 62.59
ppca_ml
## Phylogenetic pca
## Standard deviations:
## PC1 PC2 PC3 PC4 PC5
## 1.5177517 1.1314072 1.0345800 0.5460447 0.2186937
## Loads:
## PC1 PC2 PC3 PC4 PC5
## [1,] 0.004048179 0.5509678 0.82666960 -0.09187201 0.06778677
## [2,] -0.866987173 0.2818751 -0.13173788 0.38282677 0.07048717
## [3,] -0.914124937 0.3435776 0.02329003 -0.15335228 -0.14923376
## [4,] 0.216445299 0.7376206 -0.60601697 -0.19429407 0.06372188
## [5,] 0.818177267 0.4846994 0.04265330 0.28618335 -0.10923494
## lambda:
## [1] 0.8072539
system.time(ppca_reml<-phyl.pca(tree,X,opt="REML",
method="lambda",mode="corr"))
## user system elapsed
## 0.09 0.00 0.51
ppca_reml
## Phylogenetic pca
## Standard deviations:
## PC1 PC2 PC3 PC4 PC5
## 1.5176528 1.1315916 1.0344657 0.5461465 0.2187122
## Loads:
## PC1 PC2 PC3 PC4 PC5
## [1,] 0.003738619 0.5511287 0.82656819 -0.09183023 0.06778957
## [2,] -0.866993887 0.2816814 -0.13181325 0.38292998 0.07047754
## [3,] -0.914154054 0.3434866 0.02321828 -0.15337890 -0.14924855
## [4,] 0.216154604 0.7377489 -0.60595160 -0.19432734 0.06374407
## [5,] 0.818032542 0.4849287 0.04258120 0.28621609 -0.10924331
## lambda:
## [1] 0.8094609
That’s pretty striking. (As noted previously, REML & ML give us slightly different estimates of λ. This is not unexpected & in a subsequent post I’ll show that the REML estimates are unbiased.)
Lastly, let’s do this using an empirical dataset: values for various morphological traits in Anolis lizards.
data(anoletree)
data(anole.data)
anole.pca_reml<-phyl.pca(anoletree,anole.data,opt="REML")
anole.pca_reml
## Phylogenetic pca
## Standard deviations:
## PC1 PC2 PC3 PC4 PC5 PC6
## 0.33222579 0.09207406 0.05012082 0.04318123 0.02008655 0.01507125
## Loads:
## PC1 PC2 PC3 PC4 PC5 PC6
## SVL -0.9712234 0.16067288 0.01972570 0.14782215 0.06211906 0.06935433
## HL -0.9645111 0.16955087 -0.01203113 0.17994634 -0.08064324 -0.04406887
## HLL -0.9814164 -0.02674808 0.10315533 -0.13790763 -0.06887922 0.04126248
## FLL -0.9712265 0.17585694 0.10697935 -0.09105747 0.06075142 -0.04864769
## LAM -0.7810052 0.37429334 -0.47398703 -0.15871456 -0.00217418 0.00875408
## TL -0.9014509 -0.42528918 -0.07614571 0.01709649 0.01750404 -0.01088743
## lambda:
## [1] 1
PC1 is negative size – so let’s flip its sign & plot it on our tree. We’ll use the trick I posted the day before yesterday to create an arc style plot. (Minimally, so I don’t get ragged on when I tweet about this post.)
anoletree<-reorder(as.phylo(anoletree))
pc1<-setNames(-scores(anole.pca_reml)[,1],
rownames(scores(anole.pca_reml)))
aa<-fastAnc(anoletree,pc1)
aa<-setNames(c(aa[1],aa),1:(length(aa)+1)+Ntip(anoletree))
anoletree$root.edge<-2*max(nodeHeights(anoletree))
anoletree<-rootedge.to.singleton(anoletree)
obj<-contMap(anoletree,pc1,method="user",anc.states=aa,
plot=FALSE)
obj<-setMap(obj,viridisLite::viridis(n=10,direction=-1))
obj$tree<-paintBranches(obj$tree,edge=obj$tree$edge[1,2],
state="0")
cols<-setNames(c("transparent",obj$cols),
c("0",names(obj$cols)))
plot(obj$tree,colors=cols,lwd=2,type="fan",part=0.5,ftype="i",
fsize=0.6)
add.color.bar(max(nodeHeights(anoletree)),obj$cols,subtitle="",
title="PC1 from phylogenetic PCA (body size)",lims=round(obj$lims,2),
x=-0.5*max(nodeHeights(anoletree)),y=0,prompt=FALSE,lwd=6,
outline=FALSE)