Sunday, December 29, 2024

"Global" vs. "local" marginal ancestral state reconstruction of discrete phenotypic traits

The standard procedure for ancestral state reconstruction of discrete phenotypic traits under the Mk trait evolution model is as follows.

First, we fit one or more (extended) Mk models to our trait data, typically using maximum likelihood to estimate the model parameters – which are the set of instantaneous transition rates between character levels, normally given in the form of a \(k \times k\) matrix called the Q matrix (for k levels of our trait).

Second, fixing our Q matrix to its ML value, we next proceed node-by-node through the tree. At each node, we calculate the probability of our data for the tips under this fitted model (that is the set of marginal likelihoods), given that the node is in states 1, 2, 3, etc., through the \(k\) levels of our trait. The sum of these values at any node is simply the likelihood of our fitted model, Q.

Lastly, we calculate marginal scaled likelihoods by evaluating the ratio of each marginal likelihood to the total likelihood. These values can also be interpreted as conditional probabilities that each node is in each state, in which we’ve conditioned on the ML value of Q being true.

This is the procedure for ancestral state reconstruction implemented in numerous software packages for the endeavor, including phytools, corHMM, and ape, and it’s come to unambiguously predominate ancestral character estimation of discrete phenotypic traits in phylogenetic comparative biology.

A key feature of the procedure that I’ve outlined is that a constant value of the transition matrix Q is used for all nodes of the tree, and all character levels at each node.

This feature was thus also characterized as “global” ancestral state reconstruction by Pagel (1999), which he contrasted with an alternative approached denominated “local” estimation.

The question of “local” vs. “global” reconstruction is almost never discussed by phylogenetic comparative biologists. Indeed, in spite of an extensive discussion of marginal vs. joint reconstruction (e.g., see here), we didn’t even mention it on our recent book, and I left it out of the first draft of a detailed pre-print review the subject of ancestral state reconstruction, presently in revision for the journal Evolutionary Biology.

Rather than coming up with a new explanation of this contrast, here, the following is what I wrote for the current review manuscript draft of my article:

In addition to marginal vs. joint ancestral state reconstruction, another potentially important distinction applying specifically to marginal estimation that occasionally appears in the literature is between global and local reconstruction (Pagel 1999). In this case, the difference between global and local involves the estimation of Q itself. Under global reconstruction, we estimate our transition matrix Q (using, say, Maximum Likelihood) only once based on the phylogeny and all the data at the tips of the tree, without any particular assumption about the value of our character at any individual node of the phylogeny (apart from the root – see my discussion of the root prior, \(\pi\), above). We then proceed to undertake ancestral state reconstruction for all nodes while holding Q constant at this value (Yang 1995; Koshi & Goldstein 1996; Schluter et al. 1997; Yang 2006; Revell & Harmon 2022). Under local ancestral state reconstruction, on the other hand, a separate value of Q, and a separate value for the total probability of the data, is estimated conditioning on each of the \(k\) levels of our trait at every internal node of the phylogeny (Pagel 1999).

A simple example of a circumstance in which local and global estimation could yield substantively different results can be seen if we imagine a scenario where our best global estimate of Q has a forward transition rate (that is, from 0 to 1) of \(q_{0,1}>0\) (that is, some positive non-zero value), but a backward transition rate (from 1 to 0) of \(q_{1,0}=0\). Under our standard procedure of global ancestral state estimation, any node with even a single tip in condition 0 among its descendants must have been in condition 0 with probability 1.0. On the other hand, with local estimation, we would first assume that our hypothesis that the node was in condition 1 is correct, then we’d estimate a value of Q conditioning on this hypothesis being true and in which a value of \(q_{1,0}>1\) would thus be guaranteed (Pagel 1999). Our marginal reconstruction for then node then becomes a ratio of the probability of our data given that the node state is 0 or 1 divided by their sum, but not assuming that Q has a constant value for the two different cases.

Pagel (1999) provides a fairly succinct and convincing justification for local estimation in the following way:

The estimators of the likelihoods… I shall term ‘local’ estimators because the parameters of the model of evolution are found separately for each combination of ancestral states. The local procedure is used because the maximum likelihood approach is to compare the support for different hypotheses when they are assumed to be true. The support for the hypothesis that the ancestral state at a node is zero derives from how well the model can be shown to fit the data when the ancestor in question is fixed at zero. Support for the hypothesis that the state is 1 is likewise found by maximizing the likelihood but this time having fixed the ancestral value at 1. The different possible states at a node become hypotheses for which we seek to estimate the support, given the data and our model of evolution. The local method of estimation differs in calculation and logic from the estimators of ancestral states that Yang et al. (1995) and Koshi and Goldstein (1996) suggest for nucleotide and amino acid data. These authors first estimate \(L(m) \propto P(d|m)\), as defined above. They then partition the overall likelihood into its additive components to find the proportion of the total likelihood attributable to the different character states at a given node, the parameters of the model of evolution having been estimated only once and then not for any particular state, but as the single best set maximized over all possible states. I call these estimates ‘global’ because unlike the local estimators, the global likelihoods are not estimated as being conditional on any particular state at a node or set of nodes, and consequently they do not reflect the best possible fit of the model, given the hypothesis…. The key point is that the parameters of the model of evolution assumed to describe the data may vary, and sometimes greatly, depending on the assignment of a state to a given node. We do not have a way of estimating the support for a state at a node independently of these background parameters. To calculate conditional probabilities from a single model fitted to all possible background states misses this point and as a result fails then to yield the maximum likelihood estimator.

Indeed, this argument of Pagel makes a lot of sense! Nonetheless, as far as I know (and readers can correct me if I’m wrong), there is no implementation of “local” ancestral state reconstruction of discrete characters available for R.

On the other hand, it’s not that difficult to envision building one. Why don’t we see what that entails?

First, let’s take an example. In the example, I’ll begin by selecting a model and undertaking ‘global’ ancestral state reconstruction in the typical way. I’ll then show how one might go about obtaining ‘local’ ancestral state estimates using existing software in R.

My tree and data for the example come from Near et al. (2005) and Revell & Collar (2009).

## load phytools
library(phytools)
data(sunfish.tree)
tree<-as.phylo(sunfish.tree)
data(sunfish.data)
x<-setNames(sunfish.data$feeding.mode,
  rownames(sunfish.data))
plotTree.datamatrix(tree,data.frame(feeding_mode=x),
  colors=list(setNames(hcl.colors(n=2),levels(x))),
  header=FALSE)
legend("topleft",c("non-piscivorous","piscivorous"),
  pch=22,pt.bg=hcl.colors(n=2),pt.cex=1.8,cex=0.8,
  bty="n")

plot of chunk unnamed-chunk-4

To start, why don’t I compare among the four standard Mk models for a binary trait, as follows.

er<-fitMk(tree,x,model="ER")
ard<-fitMk(tree,x,model="ARD")
dirr01<-fitMk(tree,x,model=matrix(c(0,0,1,0),2,2))
dirr10<-fitMk(tree,x,model=matrix(c(0,1,0,0),2,2))
anova(er,dirr01,dirr10,ard)
##           log(L) d.f.      AIC    weight
## er     -13.07453    1 28.14906 0.3486479
## dirr01 -12.98820    1 27.97640 0.3800846
## dirr10 -14.20032    1 30.40064 0.1130998
## ard    -12.86494    2 29.72987 0.1581677

As it turns out, our best-supported model (accounting for model complexity) is, just barely, the directional non \(\rightarrow\) pisc model.

## marginal reconstruction under best model
anc_dirr01<-ancr(dirr01)
anc_dirr01
## Marginal ancestral state estimates:
##    non pisc
## 29   1    0
## 30   1    0
## 31   1    0
## 32   1    0
## 33   1    0
## 34   1    0
## ...
## 
## Log-likelihood = -12.988199

Since our ancestral state estimates are totally predictable under this model, let’s also perform ASR using the slightly less well-supported ARD model as follows:

## marginal reconstruction under ARD model
anc_ard<-ancr(ard)
anc_ard
## Marginal ancestral state estimates:
##         non     pisc
## 29 0.554292 0.445708
## 30 0.568076 0.431924
## 31 0.589648 0.410352
## 32 0.931603 0.068397
## 33 0.995898 0.004102
## 34 0.998398 0.001602
## ...
## 
## Log-likelihood = -12.864937

Now we’re ready for “local” ancestral state reconstruction.

Remember, to do this we should proceed node by node, and at each node, we must fix the node to condition 0, then to condition 1 (then to condition 2 and so on, if we have more than two states), each time maximizing the likelihood of Q. The ratio of the probability of our data given that the node is in state 0 over its sum across all other states then provides us with the marginal scaled likelihood of that node being in condition 0 (and the same is true for the other states of our character).

To fix each node in each state, what I’ll do is attach a zero-length tip to that node using phytools::bind.tip. After I do, the likelihood of my model becomes the probability of the data conditioning on that node being in that state. (Actually, it’s merely precisely proportional to that probability, with a proportionality constant that we could factor out but which will be the same for all our likelihoods.)

## local estimation (under ARD model)
anc_local<-matrix(NA,tree$Nnode,length(levels(x)),
  dimnames=list(1:tree$Nnode+Ntip(tree),
    levels(x)))
for(i in 1:tree$Nnode){
  node<-i+Ntip(tree)
  logL<-setNames(rep(NA,length(levels(x))),levels(x))
  for(j in levels(x)){
    ## bind tip to node
    tt<-bind.tip(tree,"node",edge.length=0,where=node)
    xx<-setNames(c(x,as.factor(j)),c(names(x),"node"))
    fit<-fitMk(tt,xx,model="ARD")
    logL[j]<-logLik(fit)
  }
  anc_local[i,]<-exp(logL)/sum(exp(logL))
}
head(anc_local)
##          non        pisc
## 29 0.6101902 0.389809798
## 30 0.6115063 0.388493723
## 31 0.6275412 0.372458760
## 32 0.7815525 0.218447454
## 33 0.9843482 0.015651752
## 34 0.9954944 0.004505559

OK, now let’s plot our three sets of marginal reconstructions.

par(mfrow=c(1,3))
plot(anc_dirr01,args.nodelabels=list(piecol=hcl.colors(n=2),
  cex=1.2),args.tiplabels=list(cex=0.8),
  args.plotTree=list(offset=0.5,fsize=1,
    mar=c(1.1,1.1,2.1,1.1)))
mtext("a) best (directional) model (\"global\")",line=0,adj=0)

plot(anc_ard,args.nodelabels=list(piecol=hcl.colors(n=2),
  cex=1.2),args.tiplabels=list(cex=0.8),
  args.plotTree=list(offset=0.5,fsize=1,
    mar=c(1.1,1.1,2.1,1.1)))
mtext("b) ARD model (\"global\")",line=0,adj=0)

plotTree(tree,ftype="i",offset=0.5,
  mar=c(1.1,1.1,2.1,1.1),lwd=1)
par(fg="transparent")
nodelabels(pie=anc_local,cex=1.2,piecol=hcl.colors(n=2))
tiplabels(pie=to.matrix(x[tree$tip.label],levels(x)),
  piecol=hcl.colors(n=2),cex=0.8)
par(fg="black")
mtext("c) ARD model (\"local\")",line=0,adj=0)

plot of chunk unnamed-chunk-9

In this case, we can see the big difference between panel a) and panels b) & c), but there’s also some pretty substantive changes between b) & c) themselves, even though both assume an ARD model (the former with a constant value of Q, the latter with a Q that's optimized conditioned on each state at each node).

Finally, let’s make a function that puts our calculations into practice in computing “local” marginal ancestral states. (I'm going to have this function return an "ancr" object, just for subsequent convenience in printing & plotting.)

Here goes:

## function to do local ASR
anc_local<-function(tree,x,model="ARD",trace=0){
  if(is.character(x)) x<-as.factor(x)
  if(is.factor(x)) x<-to.matrix(x,levels(x))
  k<-ncol(x)
  ace<-matrix(NA,tree$Nnode,ncol(x),
    dimnames=list(1:tree$Nnode+Ntip(tree),colnames(x)))
  xx<-rbind(x,rep(0,ncol(x)))
  rownames(xx)[nrow(xx)]<-"node"
  for(i in 1:tree$Nnode){
    if(trace>=1){
      cat(paste("Working on node",i+Ntip(tree),"...\n"))
      flush.console()
    }
    node<-i+Ntip(tree)
    logL<-setNames(rep(NA,length(levels(x))),levels(x))
    tt<-bind.tip(tree,"node",edge.length=0,where=node)
    for(j in 1:ncol(x)){
      xx["node",j]<-1
      fit<-fitMk(tt,xx,model=model)
      logL[j]<-logLik(fit)-log(1/k)
      xx["node",]<-0
    }
    if(i==1){ 
      lnL<-log(sum((1/k)*exp(logL)))
      attr(lnL,"df")<-max(fit$index.matrix)
    }
    ace[i,]<-exp(logL)/sum(exp(logL))
  }
  object<-list(
    ace=ace,
    logLik=lnL
  )
  attr(object,"type")<-"marginal" ## c("marginal","local")
  attr(object,"local")<-TRUE
  attr(object,"tree")<-tree
  attr(object,"data")<-x
  class(object)<-"ancr"
  object
}

(It would be handy to parallelize our efforts across nodes. I haven’t done that, but would certainly add it if I were to include this in phytools.)

We can try it out as follows. Warning: this will take a while to run!

data(primate.tree)
data(primate.data)
activity<-setNames(primate.data$Activity_pattern,
  rownames(primate.data))
primate_anc<-anc_local(primate.tree,activity,
  model="ARD")
primate_anc
## Marginal ancestral state estimates:
##    Cathemeral  Diurnal Nocturnal
## 91   0.454641 0.036809  0.508549
## 92   0.434699 0.048893  0.516408
## 93   0.098258 0.831309  0.070433
## 94   0.046161 0.951717  0.002122
## 95   0.000950 0.998953  0.000097
## 96   0.000493 0.999490  0.000017
## ...
## 
## Log-likelihood = -24.729028
plot(primate_anc,
  args.plotTree=list(direction="upwards"),
  args.nodelabels=list(piecol=c("goldenrod","lightblue","black")))

plot of chunk unnamed-chunk-12

This result is quite different from what we'd obtain using the more conventional global estimation (that is, using a fixed value of Q), in which the root state is "nocturnal" with very high probability. This seems to be because conditioning on the root state being "cathemeral" with a high rate of transition away from that state to the other two can explain our data with nearly the same probability as does conditioning on the root state as "nocturnal."

That’s it folks, though I suspect you’ll here more about global and local estimation from me in the New Year.

Happy holidays!

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.