A colleague recently asked me about the phytools function phyl.cca
for phylogenetic canonical correlation analysis. Canonical correlation analysis is a technique whereby, given two different matrices of continuous variables (X and Y), one can identify a set of derived variables (the canonical variables), each a linear combination of X and Y, that have the maximum correlation. Each canonical variable is orthogonal to the preceding one. Accounting for phylogeny is done in the typical way such that the correlation to be maximized is the evolutionary correlation, and each canonical variable is evolutionarily orthogonal to the preceding one. Typically, we might use canonical correlation analysis to analyze a pair of data matrices of different types – for instance, in which X contained a set of environmental trait, and Y a set of morphological features (and so on).
phyl.cca
actually replaces a much older stand-alone binary software (PCCA) that I described in a paper with Alexis Harrison way back in 2008. I offered to provide my colleague with a complete workflow illustrating the function. The dataset & tree that I’ll use for this come from the website of my recent book with Luke Harmon. I’ll use the files from Chapter 2 of the book BarbetTree.nex
and Barbetdata.csv
, which I have downloaded to my local working directory. These data are originally from an article by Gonzalez-Voyer et al. (2013) and consist of a phylogeny and trait dataset for Asian barbets (Megalaimidae).
After loading our packages, we can proceed to read in both files as follows.
## load packages
library(phytools)
library(geiger)
## read tree
barbet_tree<-read.nexus(file="BarbetTree.nex")
barbet_tree
##
## Phylogenetic tree with 42 tips and 41 internal nodes.
##
## Tip labels:
## M_asiatica_asiatica, M_lineata_hodgsoni, M_australis_australis, M_haemacephala_intermedia, M_eximia, M_viridis, ...
##
## Rooted; includes branch lengths.
(Here I used the function read.nexus
from the ape package to read in the tree. Had it been in simple Newick format, I would have used read.tree
instead.)
## read data
barbet_data<-read.csv(file="Barbetdata.csv",row.names=1)
head(barbet_data)
## wing Lnalt patch colour Frequency Length Lnote
## Calorhamphus_fuliginosus_fuliginosus 4.388257 5.298317 2.000000 1.666667 20.468894 3.1207387 0.260
## Calorhamphus_fuliginosus_hayi 4.427239 5.298317 2.000000 1.666667 22.483670 3.1371727 0.230
## M_armillaris 4.532599 7.170120 6.333333 4.000000 -7.135924 -11.9001412 0.030
## M_asiatica_asiatica 4.611152 6.802395 7.333333 5.000000 -10.153448 0.3671016 0.025
## M_asiatica_davisoni 4.605170 7.003065 6.666667 3.333333 -9.700025 0.4492719 0.030
## M_australis_duvaucelli 4.282206 6.620073 9.000000 4.000000 1.206501 -2.5198843 0.030
As you might be able to guess by looking at it, these data consist of several variables for morphology (wing
, patch
, and colour
), several variables for song (Frequency
, Length
, and Lnote
), and one variable for ecology (Lnalt
). What I’ll plan to do is subdivide our dataset into one matrix for morphology and a second for song, and then run phylogenetic canonical correlation analysis on these two matrices.
First, though, I need to do a little additional bookkeeping. I remember from our book that this barbet tree contains several species that are not represented in our the data. We’ll prune these taxa out before proceeding further.
chk<-name.check(barbet_tree,barbet_data)
summary(chk)
## 9 taxa are present in the tree but not the data:
## M_asiatica_chersonesus,
## M_australis_australis,
## M_haemacephala_celestinoi,
## M_haemacephala_delica,
## M_haemacephala_intermedia,
## M_haemacephala_rosea,
## ....
##
## To see complete list of mis-matched taxa, print object.
pruned.barbet_tree<-drop.tip(barbet_tree,chk$tree_not_data)
name.check(pruned.barbet_tree,barbet_data)
## [1] "OK"
Now let’s make our two matrices for the canonical analysis. For good measure, I’ll throw in the variable Lnalt
into my “morphological” data matrix.
barbet_morph<-barbet_data[,c("wing","patch","colour","Lnalt")]
head(barbet_morph)
## wing patch colour Lnalt
## Calorhamphus_fuliginosus_fuliginosus 4.388257 2.000000 1.666667 5.298317
## Calorhamphus_fuliginosus_hayi 4.427239 2.000000 1.666667 5.298317
## M_armillaris 4.532599 6.333333 4.000000 7.170120
## M_asiatica_asiatica 4.611152 7.333333 5.000000 6.802395
## M_asiatica_davisoni 4.605170 6.666667 3.333333 7.003065
## M_australis_duvaucelli 4.282206 9.000000 4.000000 6.620073
barbet_song<-barbet_data[,c("Frequency","Length","Lnote")]
head(barbet_song)
## Frequency Length Lnote
## Calorhamphus_fuliginosus_fuliginosus 20.468894 3.1207387 0.260
## Calorhamphus_fuliginosus_hayi 22.483670 3.1371727 0.230
## M_armillaris -7.135924 -11.9001412 0.030
## M_asiatica_asiatica -10.153448 0.3671016 0.025
## M_asiatica_davisoni -9.700025 0.4492719 0.030
## M_australis_duvaucelli 1.206501 -2.5198843 0.030
Now, we’re ready for our analysis using phyl.cca
. This works as follows.
barbet_cca<-phyl.cca(pruned.barbet_tree,barbet_morph,
barbet_song)
barbet_cca
##
## Object of class "phyl.cca" from a phylogenetic canonical
## correlation analysis.
##
## Summary of results:
## correlation X^2 P-value
## CC:1 0.576934 19.512138 0.076896
## CC:2 0.424662 8.179298 0.225260
## CC:3 0.298394 2.611137 0.271018
##
## Assumed or estimated value of lambda:
## lambda logL
## 1.0000 -256.5318
##
## Canonical x coefficients:
## CA1 CA2 CA3
## wing 15.777525 -18.997694 -4.784766
## patch -0.512648 0.048314 -0.331544
## colour 0.480505 -0.772164 -1.282279
## Lnalt 3.927334 1.871524 -2.759275
##
## Canonical y coefficients:
## CA1 CA2 CA3
## Frequency -0.118156 0.580255 -0.096996
## Length 0.213158 0.139234 0.646897
## Lnote 31.778271 -5.674240 -21.987228
That’s really all there is to it! We can see from the results that perhaps our first canonical correlation is marginally significant – but none of the others. If we had included the argument fixed=FALSE
in our phyl.cca
call we would have also estimated a value for Pagel’s \(\lambda\) which would have been used in subsequent calculations. In this case, I know the ML value of \(\lambda\) is very close to 1.0, so it doesn’t make any difference.
For fun, here is a phylo.heatmap
plot of all of our variables.
phylo.heatmap(pruned.barbet_tree,barbet_data,fsize=c(0.6,0.7,0.9),
standardize=TRUE,pts=FALSE,colors=viridisLite::viridis(n=20),
split=c(0.6,0.4))
No comments:
Post a Comment
Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.