Today I received the following inquiry (simplified a bit below):
“I'm working with a large… dataset. I'm trying to find a way to apply the
phylosig
function across each column in my dataset so that I don't
have to calculate K or λ one-by-one. Ideally I'd like to save the output
value with the column header (the trait name). It seems like it should be
pretty straightforward, but I have struggled to make this work, I think because
the rownames must be assigned to each extracted data vector in order to match
to the tree. I and some other R-users have several approaches (e.g., loops to
extract individual vectors, convert to matrix and add rownames, various apply
functions, etc). We haven't been successful.”
To show how to do this I'm going to use some real data from a study I published with Luke Mahler, Graham Reynolds, and Graham Slater a few years ago.
library(phytools)
Anolis.tree<-read.tree(file=
"https://datadryad.org/bitstream/handle/10255/dryad.80249/Revell-etal.tree.tre")
Anolis.data<-read.csv(file=
"https://datadryad.org/bitstream/handle/10255/dryad.80250/Revell-etal.data.csv",
row.names=1)
We technically don't need to do this (as phylosig
should take care
of this for us), but just to avoid any error messages I'm going to 'clean up' my
data frame so that the taxa in the data & the tree match exactly:
library(geiger)
chk<-name.check(Anolis.tree,Anolis.data)
chk
## $tree_not_data
## character(0)
##
## $data_not_tree
## [1] "roosevelti"
Anolis.data<-Anolis.data[-which(rownames(Anolis.data)%in%
chk$data_not_tree),]
name.check(Anolis.tree,Anolis.data)
## [1] "OK"
Just for fun, let's plot one of our characters: overall body size (SVL in mm).
plotTree.barplot(Anolis.tree,exp(Anolis.data[,1,drop=FALSE]),
args.plotTree=list(ftype="off"),
args.barplot=list(xlab="SVL (mm)",space=0.5))
Now all the characters together:
phylo.heatmap(Anolis.tree,Anolis.data,standardize=TRUE,
fsize=c(0.5,0.7,1))
Ooh. That's cool. Note that the pattern looks pretty similar across all the traits because they are all highly correlated with body size!
Now let's compute our phylogenetic signals first using Blomberg et al.'s (2003) K:
K<-apply(Anolis.data,2,phylosig,tree=Anolis.tree)
K
## AVG.SVL AVG.hl AVG.hw
## 1.553678 1.609666 1.593687
## AVG.hh AVG.ljl AVG.outlever
## 1.633474 1.536071 1.565743
## AVG.jugal.to.symphysis AVG.femur AVG.tibia
## 1.570188 1.508540 1.520925
## AVG.met AVG.ltoe.IV AVG.toe.IV.lam.width
## 1.585221 1.525271 1.450562
## AVG.humerus AVG.radius AVG.lfing.IV
## 1.651481 1.647935 1.615006
## AVG.fing.IV.lam.width AVG.pelv.ht AVG.pelv.wd
## 1.456251 1.487610 1.566271
## Foot.Lam.num Hand.Lam.num Avg.lnSVL2
## 1.616464 1.652438 1.519654
## Avg.ln.t1
## 1.200938
It really is that easy. For Pagel's λ the function will return both a fitted
λ value, as well as a log-likelihood. The result is that our apply( )
call produces a list which could require some post-processing. Fortunately, we can use
an R base function like simplify2array
to get our results in a more
compact format:
lambda<-t(simplify2array(apply(Anolis.data,2,phylosig,
tree=Anolis.tree,method="lambda")))
lambda
## lambda logL
## AVG.SVL 1.016502 -3.810016
## AVG.hl 1.018967 -7.919063
## AVG.hw 1.01674 -18.27221
## AVG.hh 1.007532 -23.00704
## AVG.ljl 1.016123 -10.03831
## AVG.outlever 1.018473 -9.490815
## AVG.jugal.to.symphysis 1.021371 -6.919674
## AVG.femur 1.013122 -11.72327
## AVG.tibia 1.021411 -7.14409
## AVG.met 1.021765 -5.985474
## AVG.ltoe.IV 1.021493 -13.6596
## AVG.toe.IV.lam.width 1.001934 -29.00784
## AVG.humerus 1.011935 -11.56159
## AVG.radius 1.019787 -12.34023
## AVG.lfing.IV 1.00934 -17.31632
## AVG.fing.IV.lam.width 1.008937 -32.1535
## AVG.pelv.ht 1.017735 -22.90027
## AVG.pelv.wd 1.021306 -9.937602
## Foot.Lam.num 1.020469 55.17366
## Hand.Lam.num 1.020239 44.33899
## Avg.lnSVL2 1.016559 -6.030245
## Avg.ln.t1 1.014146 -24.90667
Pretty straightforward, no?
Note that if we had just send a column of the data frame to phylosig( )
as follows:
phylosig(Anolis.tree,Anolis.data[,"AVG.met"])
## [1] "x has no names; assuming x is in the same order as tree$tip.label"
## [1] 0.2514011
that wouldn't do because the order of the rows in our data frame do not correspond to the order
of the tip labels in the tree. We can resolve this by using the function setNames
:
phylosig(Anolis.tree,setNames(Anolis.data[,"AVG.met"],
rownames(Anolis.data)))
## [1] 1.585221
but I like our apply( )
calls much better.