A friend & colleague recently contacted me about computing principal components scores for individual data from a phylogenetic principal components rotation conducted using means. Not only that, but he wanted to do this based on the correlation matrix (rather than the covariance matrix), and using an optimized Pagel's λ transformation.
Firstly, if we were just using the covariance matrix, this would be relatively straightforward. We would first compute the PCs and then we could just multiply the (recentered) individual data by the matrix of eigenvectors from our phylogenetic PCA to get the scores. The only complication arises when we conduct the recentering, because we need to recenter using the vector of phylogenetic means from the species data (i.e., the reconstructed values for each trait at the root of the tree), rather than using a simple arithmetic mean of the species values for each trait.
If we want to use the correlation matrix for our PCA then things get a bit more complicated. This is because before computing our scores for individuals in the rotated space, we need to transform each trait to have a (phylogenetic) variance of 1.0, as well as recentering on the reconstructed ancestral states for the trait. This we can do by computing the (phylogenetic) variance of each trait, and then dividing the individual values by the square root of this variance.
For the following, Xm
contains the species means data,
Xi
the data for individuals, and the rest should be fairly
obvious!
First, I'm going to compute the scores for the species means data
using phyl.pca
and then manually (to show that they're the
same), and then I will use the same algorithm to compute the scores
for individuals.
## load library
library(phytools)
## visualize the data
dotTree(tree,Xm)
## compute PCA
pca<-phyl.pca(tree,Xm,method="lambda",mode="corr")
pca
## Phylogenetic pca
## Standard deviations:
## PC1 PC2 PC3 PC4 PC5
## 1.5425533 1.1834761 0.8015111 0.6782314 0.3427766
## Loads:
## PC1 PC2 PC3 PC4 PC5
## [1,] 8.605186e-01 -0.1021593 0.04453666 0.47415186 -0.14922368
## [2,] -8.475339e-01 -0.1633525 -0.11016975 0.47117137 0.14443841
## [3,] -9.399335e-01 0.2144096 0.01911199 0.01798584 -0.26431948
## [4,] -7.748565e-06 -0.8393609 -0.52923628 -0.10504365 -0.06594048
## [5,] -1.928464e-01 -0.7829413 0.58978198 -0.04263434 -0.01235669
## lambda:
## [1] 0.6640637
## obtain among species & among trait phylogenetic covariance matrices
## with our fitted lambda
obj<-phyl.vcv(Xm[tree$tip.label,],vcv(tree),pca$lambda)
V<-obj$R
C<-obj$C
n<-nrow(Xm)
m<-ncol(Xm)
## transform the data such that the columns have phylogenetic
## variances of 1.0
Y<-Xm/matrix(rep(sqrt(diag(V)),n),n,m,byrow=TRUE)
## check (should be 1.0s)
diag(phyl.vcv(as.matrix(Y)[tree$tip.label,],C,1)$R)
## [1] 1 1 1 1 1
## invert among species covariance matrix
invC<-solve(C)[rownames(Y),rownames(Y)]
## compute ancestral states to recenter
a<-matrix(colSums(invC%*%as.matrix(Y))/sum(invC),m,1)
A<-matrix(rep(a,n),n,m,byrow=TRUE)
Y<-Y-A
## check (should be 0.0s - to numerical precision)
phyl.vcv(Y[tree$tip.label,],C,1)$alpha[,1]
## [1] 2.234332e-18 -2.968470e-16 4.187777e-16 -3.115298e-16 -3.855819e-16
## as a check, compute scores on the original species means data
Sm<-Y%*%pca$Evec
plot(pca$S,Sm,pch=21,bg=make.transparent("blue",0.25),
cex=1.5,xlab="PC scores from phyl.pca",
ylab="PC scores from this exercise")
OK, that seemed to work. Now all we need to do to compute our scores for individuals is perform exactly the same transformation of the individual data, and then rotate it using the eigenvectors of our among-species PCA:
## compute scores using the phyl.pca rotation on individual data
n<-nrow(Xi)
Yi<-Xi/matrix(rep(sqrt(diag(V)),n),n,m,byrow=TRUE)
A<-matrix(rep(a,n),n,m,byrow=TRUE)
Yi<-as.matrix(Yi)-A
Si<-Yi%*%pca$Evec
## done
For the above demo I didn't use my colleagues' data, of course. The data that I used were simulated as follows:
library(clusterGeneration)
tree<-pbtree(n=26,scale=1,tip.label=sample(LETTERS))
k<-5
V<-genPositiveDefMat(k,covMethod="unifcorrmat")$Sigma
X<-sim.corrs(phytools:::lambdaTree(tree,0.7),V)[LETTERS,]
N<-sample(1:5,nrow(X),replace=TRUE)
Xi<-apply(X,2,sampleFrom,xvar=0.2,n=N)
spp<-as.factor(rownames(Xi))
rownames(Xi)<-paste(rownames(Xi),unlist(sapply(N,function(n) 1:n)),
sep="")
foo<-function(x,spp){
coef<-coefficients(lm(x~spp))
setNames(coef[1]+c(0,coef[2:length(coef)]),
levels(spp))
}
Xm<-apply(Xi,2,foo,spp=spp)
I'll leave it to the reader to figure out how that worked!
Hello Liam,
ReplyDeleteI'm using the function phylpca and I have to select the method between "BM" and "lambda". However, I do not understand what "BM" refers exactly (Blomberg's ?) as also "lambda" refers to a Brownian Motion so I do not understand what does "BM" mean exactly.