Over the past couple of days I’ve shown how phytools could now be used to fit a discrete character evolution model with \(\Gamma\)-distributed rate heterogeneity among edges (yesterday morning); and how this fitted model can be applied to reconstruct ancestral states at internal nodes (this morning).
Here, I’m going to take use an empirical case study to illustrate how it matters.
For the case study, I’m going to use data & a phylogeny subsampled from an excellent dataset mapping bi- and tri-cellular pollen grains across a tree of flowering plants (Williams et al. 2014). (We also used these data in our book with Princeton University Press, so you can download the files from the book website.)
I’ll start off by loading phytools after updating it from GitHub as follows.
library(phytools)
## Loading required package: ape
## Loading required package: maps
packageVersion("phytools")
## [1] '2.0.9'
Now, I’ll load my dataset & tree from file.
pollen_data<-read.csv(
file="http://www.phytools.org/Rbook/7/pollen-data.csv",
row.names=1)
head(pollen_data)
## V1
## Nuphar_lutea 3
## Nuphar_pumila 2
## Nuphar_advena 2
## Austrobaileya_scandens 2
## Schisandra_propinqua 2
## Kadsura_heteroclita 2
pollen<-setNames(as.factor(pollen_data$V1),rownames(pollen_data))
pollen_tree<-read.tree(
file="http://www.phytools.org/Rbook/7/pollen-tree.phy")
pollen_tree
##
## Phylogenetic tree with 511 tips and 510 internal nodes.
##
## Tip labels:
## Nuphar_lutea, Nuphar_pumila, Nuphar_advena, Austrobaileya_scandens, Schisandra_propinqua, Kadsura_heteroclita, ...
##
## Rooted; includes branch lengths.
Our trait has two levels, so we can start by fitting a standard extended Mk model with different backward & forward transition rates using fitMk
as follows.
pollen_mk<-fitMk(pollen_tree,pollen,model="ARD",
pi="fitzjohn",rand_start=TRUE)
pollen_mk
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## 2 3
## 2 -0.002997 0.002997
## 3 0.004727 -0.004727
##
## Fitted (or set) value of pi:
## 2 3
## 0.860551 0.139449
## due to treating the root prior as (a) nuisance.
##
## Log-likelihood: -215.278634
##
## Optimization method used was "nlminb"
##
## R thinks it has found the ML solution.
Now let’s do the same for our new \(\Gamma\) rate heterogeneous model using the new phytools function fitgammaMk
pollen_gamma<-fitgammaMk(pollen_tree,pollen,
model="ARD",pi="fitzjohn",
rand_start=TRUE,min.alpha=0.01)
pollen_gamma
## Object of class "fitgammaMk".
##
## Fitted (or set) value of Q:
## 2 3
## 2 -0.007473 0.007473
## 3 0.009101 -0.009101
##
## Fitted (or set) value of alpha rate heterogeneity
## (with 4 rate categories): 0.015511
##
## Fitted (or set) value of pi:
## 2 3
## 0.88223 0.11777
## due to treating the root prior as (a) nuisance.
##
## Log-likelihood: -207.155259
##
## Optimization method used was "nlminb"
##
## R thinks it has found the ML solution.
Right away, it’s easy for me to ascertain that our \(\Gamma\) model is better-supported than our standard Mk model – but let’s run an anova
function call on the two objects to check.
anova(pollen_mk,pollen_gamma)
## log(L) d.f. AIC weight
## pollen_mk -215.2786 2 434.5573 0.0008053927
## pollen_gamma -207.1553 3 420.3105 0.9991946073
This indicates that almost all of the weight of evidence falls on our \(\Gamma\) variable-rate model.
As I showed this morning, it’s already straightforward to obtain ancestral states under our fitted model.
Here, I’m going to do it twice: once for our Mk model and then a second time for our \(\Gamma\) model. When I create my graph showing these marginal estimates mapped onto the tree, I’ll increase the node size of any node where neither state has a marginal scaled likelihood > 0.95.
pollen_mk.anc<-ancr(pollen_mk)
cex<-apply(pollen_mk.anc$ace,1,
function(x) if(any(x>0.95)) 0.1 else 0.3)
cols<-setNames(viridisLite::viridis(n=2),2:3)
plot(pollen_mk.anc,type="arc",arc_height=0.2,
args.nodelabels=list(cex=cex,piecol=cols))
mtext("a) standard Mk model",adj=0.05,line=-1)