Thursday, March 23, 2023

Marginal ancestral state reconstruction under a hidden-rates model using new phytools generic method

This morning I posted about a new method (I called it anceb) for marginal ancestral character estimation.

I decided I didn’t like that name, so I’ve switched to ancr (for ancestral character reconstruction). I get that this is quite close to ace in ape and asr in diversitree (both of which do different flavors of the same thing), but I just found anceb to clunky and annoying to speak aloud!

The idea of this (generic) method is that it takes a fitted discrete character evolution (e.g., from fitMk or fitHRM) and then computes the marginal reconstructions under that model. Marginal ancestral states are also referred to as scaled likelihoods, and it’s valid to interpret them as the probabilities that each node is in each of the different states, conditioning on the fitted model. (This is why they are also called “empirical Bayes” posterior probabilities – because they are posterior probabilities but in which we condition, normally, on the ML value of our transition model, Q.)

To illustrate this method, I’m going to apply it to a trait evolution scenario for the so-called “hidden-rates-model” of Beaulieu et al. (2013). For this, I’ll use dataset of parity mode in liolaemid lizards from Esquerré et al. (2018). Previously I found that the best-supported model for these data was a class of hidden-rates-model that I refer to as the ‘umbral’ model in which we have hidden states of the character (say 0* and 1* for an observed character with two levels 0 & 1), but in which evolution always had to proceed 0* ⇆ 0 ⇆ 1 ⇆ 1*. The idea is to capture “cold” and “hot” conditions for the state, in which when in the cold condition the character can’t evolve at all without first transitioning back to the hot condition.

To fit this model & reconstruct ancestral states we’ll start off by loading our updated phytools package as follows.

library(phytools)
packageVersion("phytools")
## [1] '1.6.1'

Next, we’ll read our data from file.

liolaemid.tree<-read.nexus(
  file="http://www.phytools.org/Rbook/7/Liolaemidae.MCC.nex")
liolaemid.tree<-untangle(ladderize(liolaemid.tree),"read.tree")
liolaemid.tree
## 
## Phylogenetic tree with 258 tips and 257 internal nodes.
## 
## Tip labels:
##   Liolaemus_albiceps, Liolaemus_irregularis, Liolaemus_ornatus, Liolaemus_calchaqui, Liolaemus_crepuscularis, Liolaemus_lavillai, ...
## 
## Rooted; includes branch lengths.
liolaemid.data<-read.csv(
  file="http://www.phytools.org/Rbook/7/Liolaemidae.data.csv",
  row.names=1,stringsAsFactors=TRUE)
parity_mode<-setNames(liolaemid.data$parity_mode,
    rownames(liolaemid.data))
levels(parity_mode)<-c("oviparous","viviparous")
head(parity_mode)
## Ctenoblepharys_adspersa       Liolaemus_abaucan      Liolaemus_albiceps       Liolaemus_andinus 
##               oviparous               oviparous              viviparous              viviparous 
##     Liolaemus_annectens      Liolaemus_anomalus 
##              viviparous               oviparous 
## Levels: oviparous viviparous

So far so good. Now, we’ll fit our model using phytools::fitHRM. Be forewarned: this takes a while!

hrm.umbral<-fitHRM(liolaemid.tree,parity_mode,umbral=TRUE,
    pi="fitzjohn",opt.method="optimParallel",rand_start=TRUE)
## 
## This is the design matrix of the fitted model.
## Does it make sense?
## 
##             oviparous oviparous* viviparous viviparous*
## oviparous           0          1          2           0
## oviparous*          3          0          0           0
## viviparous          4          0          0           5
## viviparous*         0          0          6           0
## 
## log-likelihood from current iteration: -65.947 
##  --- Best log-likelihood so far: -65.947 ---
## log-likelihood from current iteration: -62.448 
##  --- Best log-likelihood so far: -62.448 ---
## log-likelihood from current iteration: -65.947 
##  --- Best log-likelihood so far: -62.448 ---
## log-likelihood from current iteration: -63.3995 
##  --- Best log-likelihood so far: -62.448 ---
## log-likelihood from current iteration: -60.6924 
##  --- Best log-likelihood so far: -60.6924 ---
## log-likelihood from current iteration: -62.448 
##  --- Best log-likelihood so far: -60.6924 ---
## log-likelihood from current iteration: -62.448 
##  --- Best log-likelihood so far: -60.6924 ---
## log-likelihood from current iteration: -60.6925 
##  --- Best log-likelihood so far: -60.6924 ---
## log-likelihood from current iteration: -62.448 
##  --- Best log-likelihood so far: -60.6924 ---
## log-likelihood from current iteration: -62.448 
##  --- Best log-likelihood so far: -60.6924 ---
hrm.umbral
## Object of class "fitHRM".
## 
## Observed states: [ oviparous, viviparous ]
## Number of rate categories per state: [ 2, 2 ]
## 
## Fitted (or set) value of Q:
##             oviparous oviparous* viviparous viviparous*
## oviparous   -1.970884   0.039883   1.931001    0.000000
## oviparous*   0.037897  -0.037897   0.000000    0.000000
## viviparous   2.696768   0.000000  -3.111231    0.414463
## viviparous*  0.000000   0.000000   0.000000    0.000000
## 
## Fitted (or set) value of pi:
##   oviparous  oviparous*  viviparous viviparous* 
##    0.034068    0.939401    0.026531    0.000000 
## due to treating the root prior as (a) nuisance.
## 
## Log-likelihood: -60.692435 
## 
## Optimization method used was "optimParallel"

If we graph our fitted model, we should see that it corresponds nicely to the process we hypothesized.

plot(hrm.umbral,spacer=0.3,offset=0.03,
  mar=rep(0.1,4))

plot of chunk unnamed-chunk-9

Having fit this model to our data, we just need to pass the fitted object directly to our new ancr (formerly, briefly anceb) as follows. The only argument I’m going to specify is tips=TRUE which will return the posterior probabilities that each of the leaves of the tree are in each of the two unobserved conditions of the character, given its observed state.

parity_ancr<-ancr(hrm.umbral,tips=TRUE)
parity_ancr
## Marginal ancestral state estimates:
##                         oviparous oviparous* viviparous viviparous*
## Liolaemus_albiceps              0          0   0.038590    0.961410
## Liolaemus_irregularis           0          0   0.038590    0.961410
## Liolaemus_ornatus               0          0   0.062484    0.937516
## Liolaemus_calchaqui             0          0   0.071095    0.928905
## Liolaemus_crepuscularis         0          0   0.071095    0.928905
## Liolaemus_lavillai              0          0   0.108063    0.891937
## ...
## 
## Log-likelihood = -60.692435

Lastly, let’s graph these probabilities onto the tree.

h<-max(nodeHeights(liolaemid.tree))
plotTree(liolaemid.tree,ftype="off",lwd=1,
  ylim=c(0,1.05*h),direction="upwards")
pp<-get("last_plot.phylo",envir=.PlotPhyloEnv)
cols<-setNames(c("#f0ead6","#fffff2","darkred",palette()[2]),
  colnames(parity_ancr$ace))
for(i in 1:Ntip(liolaemid.tree)){
  tcp<-c(0,cumsum(parity_ancr$ace[liolaemid.tree$tip.label[i],]))
  tcp<-tcp*0.05*h
  for(j in 2:length(tcp)){
    polygon(rep(pp$xx[i],4)+c(-0.5,-0.5,0.5,0.5),
      h+c(tcp[j-1],tcp[j],tcp[j],tcp[j-1]),
      border=FALSE,col=cols[j-1])
  }
}
legend(x=0,y=0.5*h,colnames(parity_ancr$ace),pch=15,
  col=cols,pt.cex=1.5,cex=0.8,bty="n")
par(fg="transparent")
nodelabels(pie=parity_ancr$ace[1:Nnode(liolaemid.tree)+
  Ntip(liolaemid.tree),],piecol=cols,cex=0.3)

plot of chunk unnamed-chunk-11

par(fg="black")

In our model, the “cold” conditions are the two asterisked states, shown in light red (for viviparity) and light eggshell (for oviparity), respectively. In general, we find that our evolutionary process tends to be in those two conditions in parts of the tree in which the character is changing little or not at all, which makes sense. Cool!

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.