Wednesday, March 30, 2022

Combining phylogenetic PCA with the factoextra package

A couple of days ago, Roberto Toro posted the following tweet:

I responded, and then we wrote back and forth, and I think I've now developed a solution whereby a "phyl.pca" object produced by phytools::phyl.pca can be converted to a "princomp" object and plotted using the factoextra package.

The update can be seen on the phytools GitHub page, and obtained by updating phytools from GitHub using devtools.

The following demo requires that phytools be updated (to version 1.0.2 or above) & that factoextra is installed from CRAN.

First, I'll load my packages & data.

## load packages
library(phytools)
packageVersion("phytools")
## [1] '1.0.2'
library(factoextra)
## load example data
data(anoletree)
data(anole.data)

Next, I'll pull out the factor “ecomorph” from the tree object, anoletree. This is only possible because anoletree is a "simmap" class object, but I'll use this factor to plot my PC scores by group – something that factoextra facilitates. All the other steps of this demo do not require a grouping factor – I do this only to help demonstrate the full capabilities of factoextra!

ecomorph<-getStates(anoletree,"tips")
head(ecomorph,10)
##            ahli         allogus     rubribarbus           imias          sagrei 
##            "TG"            "TG"            "TG"            "TG"            "TG" 
##         bremeri quadriocellifer      ophiolepis         mestrei           jubar 
##            "TG"            "TG"            "GB"            "TG"            "TG"

Now I'll do my phylogenetic PCA (using phyl.pca) and then convert my "phyl.pca" object to a "princomp" object using as.princomp, the function I just added to phytools.

pca<-phyl.pca(anoletree,anole.data)
pca
## Phylogenetic pca
## Standard deviations:
##        PC1        PC2        PC3        PC4        PC5        PC6 
## 0.33222579 0.09207406 0.05012082 0.04318123 0.02008655 0.01507125 
## Loads:
##            PC1         PC2         PC3         PC4         PC5         PC6
## SVL -0.9712234  0.16067288  0.01972570  0.14782215 -0.06211906  0.06935433
## HL  -0.9645111  0.16955087 -0.01203113  0.17994634  0.08064324 -0.04406887
## HLL -0.9814164 -0.02674808  0.10315533 -0.13790763  0.06887922  0.04126248
## FLL -0.9712265  0.17585694  0.10697935 -0.09105747 -0.06075142 -0.04864769
## LAM -0.7810052  0.37429334 -0.47398703 -0.15871456  0.00217418  0.00875408
## TL  -0.9014509 -0.42528918 -0.07614571  0.01709649 -0.01750404 -0.01088743
obj<-as.princomp(pca)
obj
## Call:
## phyl.pca(tree = anoletree, Y = anole.data)
## 
## Standard deviations:
##     Comp.1     Comp.2     Comp.3     Comp.4     Comp.5     Comp.6 
## 0.33222579 0.09207406 0.05012082 0.04318123 0.02008655 0.01507125 
## 
##  6  variables and  observations.

Try a simple factoextra visualization.

fviz_screeplot(obj,addlabels=TRUE)

plot of chunk unnamed-chunk-4

Cool!

Finally, I'll use a different factoextra plotting method to show the PC scores in two different dimensions separated by my discrete factor, ecomorph.

fviz_pca_ind(obj,label="none",habillage=ecomorph,
    addEllipses=TRUE)

plot of chunk unnamed-chunk-5

That's it.

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.