Saturday, November 11, 2023

Marginal ancestral state reconstruction using ancr in phytools 2.0

A new version of phytools is on CRAN (phytools 2.0-3).

This is linked to the impending publication of a new phytools article in the open-access journal PeerJ. (You can already read a pre-print here.)

Rather than simply list the updates & new features of this phytools version (as I often do on this blog), I thought it could be more constructive to make one or two posts illustrating R phylogenetics workflows that are possible with this phytools version – but may not have been possible when we (say) published our Phylogenetic Comparative Methods in R book just last year!

For this post, I decided to focus on ancestral character estimation for a discrete character.

We may as well start off by loading phytools and checking our version number….

library(phytools)
packageVersion("phytools")
## [1] '2.0.3'

For the example, I’ll use a dataset & phylogeny subsampled from a Benun Sutton & Wilson (2019). These data consist of a phylogeny of bony fishes and data for spawning mode & paternal care. The published data were actually quite a bit larger – but this dataset has been randomly subsampled for illustrative purposes only. The Benun Sutton & Wilson phylogeny in turn derives from Betancur et al. (2017).

data(bonyfish.tree)
bonyfish.tree<-ladderize(bonyfish.tree)
bonyfish.tree
## 
## Phylogenetic tree with 90 tips and 89 internal nodes.
## 
## Tip labels:
##   Xenomystus_nigri, Chirocentrus_dorab, Talismania_bifurcata, Alepocephalus_tenebrosus, Misgurnus_bipartitus, Opsariichthys_bidens, ...
## 
## Rooted; includes branch lengths.
data(bonyfish.data)
head(bonyfish.data)
##                          spawning_mode paternal_care
## Xenomystus_nigri                  pair          male
## Chirocentrus_dorab               group          none
## Talismania_bifurcata             group          none
## Alepocephalus_tenebrosus         group          none
## Misgurnus_bipartitus              pair          none
## Opsariichthys_bidens              pair          none

I’m going to focus on the binary character of paternal care: males either provision it (coded "paternal") or they don’t (code "none"). We should extract this character as a vector with names – which any readers of our book will appreciate I like to do using stats::setNames.

paternal_care<-setNames(bonyfish.data$paternal_care,
  rownames(bonyfish.data))
head(paternal_care)
##         Xenomystus_nigri       Chirocentrus_dorab     Talismania_bifurcata 
##                     male                     none                     none 
## Alepocephalus_tenebrosus     Misgurnus_bipartitus     Opsariichthys_bidens 
##                     none                     none                     none 
## Levels: male none

In phytools 2.0, ancestral state estimation is performed using the S3 method ancr.

ancr is designed to take a variety of different objects, primarily fitted models or sets of models, as input.

One of these object types is a fitted Mk model deriving from phytools::fitMk. This seems appropriate to our data & phylogeny at hand.

Let’s start, then, by fitting an equal-rates model ("ER") and a different rates model ("ARD"), and comparing them.

fish_er<-fitMk(bonyfish.tree,paternal_care,model="ER",
  pi="fitzjohn")
fish_er
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##           male      none
## male -0.001959  0.001959
## none  0.001959 -0.001959
## 
## Fitted (or set) value of pi:
##     male     none 
## 0.098523 0.901477 
## due to treating the root prior as (a) nuisance.
## 
## Log-likelihood: -33.150827 
## 
## Optimization method used was "nlminb"
## 
## R thinks it has found the ML solution.
fish_ard<-fitMk(bonyfish.tree,paternal_care,model="ARD",
  pi="fitzjohn")
fish_ard
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##          male      none
## male 0.000000  0.000000
## none 0.002229 -0.002229
## 
## Fitted (or set) value of pi:
## male none 
##    0    1 
## due to treating the root prior as (a) nuisance.
## 
## Log-likelihood: -31.542749 
## 
## Optimization method used was "nlminb"
## 
## R thinks it has found the ML solution.

The best-fitting model in each case is quite different. Under "ER", of course, back-and-forth rates of change between the two states are forced to assume the same value. When they’re permitted to take different values, by contrast, paternal care can evolve from no-care – but never the reverse.

Let’s compare them using the normal machinery of likelihood – in this case, AIC.

anova(fish_er,fish_ard)
##             log(L) d.f.      AIC    weight
## fish_er  -33.15083    1 68.30165 0.3524976
## fish_ard -31.54275    2 67.08550 0.6475024

Though this tells us that the majority of weight of evidence falls with the different-rates model – a substantial portion (>35%) also supports the equal-rates model.

Let’s compute marginal ancestral states under each of these two models as follows.

fish_er.anc<-ancr(fish_er)
fish_er.anc
## Marginal ancestral state estimates:
##        male     none
## 91 0.011804 0.988196
## 92 0.002168 0.997832
## 93 0.001939 0.998061
## 94 0.002191 0.997809
## 95 0.003462 0.996538
## 96 0.004425 0.995575
## ...
## 
## Log-likelihood = -33.150827
fish_ard.anc<-ancr(fish_ard)
fish_ard.anc
## Marginal ancestral state estimates:
##    male none
## 91    0    1
## 92    0    1
## 93    0    1
## 94    0    1
## 95    0    1
## 96    0    1
## ...
## 
## Log-likelihood = -31.542749

Interestingly, here we can quite clearly see the effect of fixing an irreversible trait evolution model in ancestral character estimation…. All nodes in which at least one descendant lacks paternal care is unambiguously reconstructed into the "none" condition. In my opinion, this overstates the strength of evidence about ancestral states, not least because the evidence for our model was so uncertain!

Lastly (and, so far as I know, uniquely), phytools 2.0 allows us to model-average across a set of models. We do that simply by passing our anova result to ancr. We can do that as follows.

fish.avg_anc<-ancr(anova(fish_er,fish_ard))
##             log(L) d.f.      AIC    weight
## fish_er  -33.15083    1 68.30165 0.3524976
## fish_ard -31.54275    2 67.08550 0.6475024
fish.avg_anc
## Marginal ancestral state estimates:
##        male     none
## 91 0.004161 0.995839
## 92 0.000764 0.999236
## 93 0.000683 0.999317
## 94 0.000772 0.999228
## 95 0.001220 0.998780
## 96 0.001560 0.998440
## ...
## 
## Log-likelihood = -32.777644

Neat.

phytools 2.0 also features a handy plot method for the "ancr" object class.

Let’s apply it to each one of our marginal reconstructions in turn to see how they differ one from the other.

In this code, I also illustrate how to size node labels by the state ambiguity: any node in which neither state has 95% support is sized larger than less ambiguous nodes.

cols<-viridisLite::viridis(n=2)
er_cex<-apply(fish_er.anc$ace,1,function(x) if(any(x>0.95)) 0.2 else 0.6)
plot(fish_er.anc,args.nodelabels=list(cex=er_cex,piecol=cols),
  mar=c(0.1,0.1,1.1,0.1))
mtext("a) ER model marginal ancestral states",line=0,adj=0)

plot of chunk unnamed-chunk-13

ard_cex<-apply(fish_ard.anc$ace,1,function(x) if(any(x>0.95)) 0.2 else 0.6)
plot(fish_ard.anc,args.nodelabels=list(cex=ard_cex,piecol=cols),
  mar=c(0.1,0.1,1.1,0.1))
mtext("b) ARD model marginal ancestral states",line=0,adj=0)

plot of chunk unnamed-chunk-14

avg_cex<-apply(fish.avg_anc$ace,1,function(x) if(any(x>0.95)) 0.2 else 0.6)
plot(fish.avg_anc,args.nodelabels=list(cex=avg_cex,piecol=cols),
  mar=c(0.1,0.1,1.1,0.1))
mtext("c) model-averaged marginal ancestral states",line=0,adj=0)

plot of chunk unnamed-chunk-15

Although a lot of the reconstructions agree between trees, we see several points of disagreement. Figure c) resolves these in proportion to the weight of evidence supporting each model.

Cool.

1 comment:

  1. Hello, I really admire all your work, right now i'm trying to do an exercise about Ancestral States Reconstruction but i'm trying to plot facing trees with both states and i need to put the legend of the states in the bottom right of the plot but I can´t find the way to do it, is there a way you can help me?

    ReplyDelete

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.