Monday, September 16, 2019

Computing phylogenetic signal from a set of quantitative traits in an efficient way

Today I received the following inquiry (simplified a bit below):

“I'm working with a large… dataset. I'm trying to find a way to apply the phylosig function across each column in my dataset so that I don't have to calculate K or λ one-by-one. Ideally I'd like to save the output value with the column header (the trait name). It seems like it should be pretty straightforward, but I have struggled to make this work, I think because the rownames must be assigned to each extracted data vector in order to match to the tree. I and some other R-users have several approaches (e.g., loops to extract individual vectors, convert to matrix and add rownames, various apply functions, etc). We haven't been successful.”

To show how to do this I'm going to use some real data from a study I published with Luke Mahler, Graham Reynolds, and Graham Slater a few years ago.

library(phytools)
Anolis.tree<-read.tree(file=
    "https://datadryad.org/bitstream/handle/10255/dryad.80249/Revell-etal.tree.tre")
Anolis.data<-read.csv(file=
    "https://datadryad.org/bitstream/handle/10255/dryad.80250/Revell-etal.data.csv",
    row.names=1)

We technically don't need to do this (as phylosig should take care of this for us), but just to avoid any error messages I'm going to 'clean up' my data frame so that the taxa in the data & the tree match exactly:

library(geiger)
chk<-name.check(Anolis.tree,Anolis.data)
chk
## $tree_not_data
## character(0)
## 
## $data_not_tree
## [1] "roosevelti"
Anolis.data<-Anolis.data[-which(rownames(Anolis.data)%in%
    chk$data_not_tree),]
name.check(Anolis.tree,Anolis.data)
## [1] "OK"

Just for fun, let's plot one of our characters: overall body size (SVL in mm).

plotTree.barplot(Anolis.tree,exp(Anolis.data[,1,drop=FALSE]),
    args.plotTree=list(ftype="off"),
    args.barplot=list(xlab="SVL (mm)",space=0.5))

plot of chunk unnamed-chunk-3

Now all the characters together:

phylo.heatmap(Anolis.tree,Anolis.data,standardize=TRUE,
    fsize=c(0.5,0.7,1))

plot of chunk unnamed-chunk-4

Ooh. That's cool. Note that the pattern looks pretty similar across all the traits because they are all highly correlated with body size!

Now let's compute our phylogenetic signals first using Blomberg et al.'s (2003) K:

K<-apply(Anolis.data,2,phylosig,tree=Anolis.tree)
K
##                AVG.SVL                 AVG.hl                 AVG.hw 
##               1.553678               1.609666               1.593687 
##                 AVG.hh                AVG.ljl           AVG.outlever 
##               1.633474               1.536071               1.565743 
## AVG.jugal.to.symphysis              AVG.femur              AVG.tibia 
##               1.570188               1.508540               1.520925 
##                AVG.met            AVG.ltoe.IV   AVG.toe.IV.lam.width 
##               1.585221               1.525271               1.450562 
##            AVG.humerus             AVG.radius           AVG.lfing.IV 
##               1.651481               1.647935               1.615006 
##  AVG.fing.IV.lam.width            AVG.pelv.ht            AVG.pelv.wd 
##               1.456251               1.487610               1.566271 
##           Foot.Lam.num           Hand.Lam.num             Avg.lnSVL2 
##               1.616464               1.652438               1.519654 
##              Avg.ln.t1 
##               1.200938

It really is that easy. For Pagel's λ the function will return both a fitted λ value, as well as a log-likelihood. The result is that our apply( ) call produces a list which could require some post-processing. Fortunately, we can use an R base function like simplify2array to get our results in a more compact format:

lambda<-t(simplify2array(apply(Anolis.data,2,phylosig,
    tree=Anolis.tree,method="lambda")))
lambda
##                        lambda   logL     
## AVG.SVL                1.016502 -3.810016
## AVG.hl                 1.018967 -7.919063
## AVG.hw                 1.01674  -18.27221
## AVG.hh                 1.007532 -23.00704
## AVG.ljl                1.016123 -10.03831
## AVG.outlever           1.018473 -9.490815
## AVG.jugal.to.symphysis 1.021371 -6.919674
## AVG.femur              1.013122 -11.72327
## AVG.tibia              1.021411 -7.14409 
## AVG.met                1.021765 -5.985474
## AVG.ltoe.IV            1.021493 -13.6596 
## AVG.toe.IV.lam.width   1.001934 -29.00784
## AVG.humerus            1.011935 -11.56159
## AVG.radius             1.019787 -12.34023
## AVG.lfing.IV           1.00934  -17.31632
## AVG.fing.IV.lam.width  1.008937 -32.1535 
## AVG.pelv.ht            1.017735 -22.90027
## AVG.pelv.wd            1.021306 -9.937602
## Foot.Lam.num           1.020469 55.17366 
## Hand.Lam.num           1.020239 44.33899 
## Avg.lnSVL2             1.016559 -6.030245
## Avg.ln.t1              1.014146 -24.90667

Pretty straightforward, no?

Note that if we had just send a column of the data frame to phylosig( ) as follows:

phylosig(Anolis.tree,Anolis.data[,"AVG.met"])
## [1] "x has no names; assuming x is in the same order as tree$tip.label"
## [1] 0.2514011

that wouldn't do because the order of the rows in our data frame do not correspond to the order of the tip labels in the tree. We can resolve this by using the function setNames:

phylosig(Anolis.tree,setNames(Anolis.data[,"AVG.met"],
    rownames(Anolis.data)))
## [1] 1.585221

but I like our apply( ) calls much better.