Wednesday, July 12, 2017

Fitting discrete character (Mk) models with a fixed state at the root

Today the question was asked:

“…how do I specify an ancestral state for my root node phytools?? And can I specify one when using the ER model of character evolution? I feel this should be really simple to answer, sorry but if you could direct me to an answer that would be fantastic.”

I'm going to interpret this as meaning “how do I fit a discrete character model in which the state at the global root is known.”

This is not super easy. It is most straightforward to do using fitMk, but can also be accomplished using fitDiscrete of the geiger package.

First I'll demonstrate fitMk.

Here are our tree & data:

library(phytools)
library(RColorBrewer)
states<-sort(unique(x))
plotTree(tree,type="fan",ftype="off")
tiplabels(tip=1:Ntip(tree),pie=to.matrix(x,states),
    piecol=brewer.pal(3,"Accent"),cex=0.6)
add.simmap.legend(colors=setNames(brewer.pal(3,"Accent"),
    states),shape="circle",prompt=FALSE,x=-1,y=-0.8)

plot of chunk unnamed-chunk-1

Now, we're going to use an 'ordered' model - but in which we either use a flat prior probability distribution on the global root, or we fix the root to each of the three possible states:

model<-matrix(c(0,1,0,1,0,1,0,1,0),3,3,dimnames=list(states,
    states))
model
##   a b c
## a 0 1 0
## b 1 0 1
## c 0 1 0
fit.flat<-fitMk(tree,x,model=model)
fit.flat
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##           a         b         c
## a -1.204273  1.204273  0.000000
## b  1.204273 -2.408547  1.204273
## c  0.000000  1.204273 -1.204273
## 
## Fitted (or set) value of pi:
##         a         b         c 
## 0.3333333 0.3333333 0.3333333 
## 
## Log-likelihood: -79.285705 
## 
## Optimization method used was "nlminb"
fit.a<-fitMk(tree,x,model=model,pi=setNames(c(1,0,0),states))
fit.a
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##           a         b         c
## a -1.266891  1.266891  0.000000
## b  1.266891 -2.533781  1.266891
## c  0.000000  1.266891 -1.266891
## 
## Fitted (or set) value of pi:
## a b c 
## 1 0 0 
## 
## Log-likelihood: -80.385615 
## 
## Optimization method used was "nlminb"
fit.b<-fitMk(tree,x,model=model,pi=setNames(c(0,1,0),states))
fit.b
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##           a         b         c
## a -1.148579  1.148579  0.000000
## b  1.148579 -2.297158  1.148579
## c  0.000000  1.148579 -1.148579
## 
## Fitted (or set) value of pi:
## a b c 
## 0 1 0 
## 
## Log-likelihood: -78.905018 
## 
## Optimization method used was "nlminb"
fit.c<-fitMk(tree,x,model=model,pi=setNames(c(0,0,1),states))
fit.c
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##           a         b         c
## a -1.251647  1.251647  0.000000
## b  1.251647 -2.503294  1.251647
## c  0.000000  1.251647 -1.251647
## 
## Fitted (or set) value of pi:
## a b c 
## 0 0 1 
## 
## Log-likelihood: -79.059712 
## 
## Optimization method used was "nlminb"

Note that although one might be tempted to compare the likelihoods of these three fitted models, this should not in fact be done. This is because the likelihoods are in each case conditioned on the state at the root state being known - in other words, they are conditioned on different data. That this is not equivalent to constraining a particular parameter to have some value is most evident by comparison between the model in which the root is fixed to state "b" to the model in which the root state is unknown. If the former was a special case of the latter, then it should have a lower (or equal) likelihood, when in fact its likelihood is slightly higher. How come? Well in reality the fixed state likelihood is the probability that all the tips are in their observed states and that the root is in state "b", given the tree, a model, and the model parameters (to be estimated); whereas the 'flat prior' model likelihood is simply the probability of the data at the tips, given the tree & model, but integrating over all possible values for the root node of the phylogeny.

We can do the same analysis using fitDiscrete, but it is not quite as straightforward. First, we have to use the method to generate a likelihood function, and then we can optimize that function using different prior probability distributions for the root of the tree, as follows:

library(geiger)
obj<-fitDiscrete(tree,x,model=model)
## Warning in fitDiscrete(tree, x, model = model): Parameter estimates appear at bounds:
##  q13
##  q31
obj
## GEIGER-fitted comparative model of discrete data
##  fitted Q matrix:
##               a         b         c
##     a -1.184179  1.184179  0.000000
##     b  1.184179 -2.368358  1.184179
##     c  0.000000  1.184179 -1.184179
## 
##  model summary:
##  log-likelihood = -79.073420
##  AIC = 160.146839
##  AICc = 160.180738
##  free parameters = 1
## 
## Convergence diagnostics:
##  optimization iterations = 100
##  failed iterations = 0
##  frequency of best fit = 1.00
## 
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates

(Note that fitDiscrete does not use a flat prior by default.)

lik<-obj$lik
lik
## likelihood function for univariate discrete trait evolution
##  argument names: q12
## 
##  mapping
##      1: a
##      2: b
##      3: c
## 
## definition:
## function (pars, ...) 
## {
##     if (missing(pars)) 
##         stop(paste("The following 'pars' are expected:\n\t", 
##             paste(argn(tmp), collapse = "\n\t", sep = ""), sep = ""))
##     pars = .repars(pars, argn(tmp))
##     tmp(pars, ...)
## }
## <bytecode: 0x00000000086ba470>
## <environment: 0x000000000bcfb828>
fit.flat<-optimize(lik,c(1e-8,1000),root="flat",maximum=TRUE)
fit.flat
## $maximum
## [1] 1.20427
## 
## $objective
## [1] -79.2857
fit.a<-optimize(lik,c(1e-8,1000),root="given",
    root.p=setNames(c(1,0,0),states),maximum=TRUE)
fit.a
## $maximum
## [1] 1.266875
## 
## $objective
## [1] -80.38561
fit.b<-optimize(lik,c(1e-8,1000),root="given",
    root.p=setNames(c(0,1,0),states),maximum=TRUE)
fit.b
## $maximum
## [1] 1.148578
## 
## $objective
## [1] -78.90502
fit.c<-optimize(lik,c(1e-8,1000),root="given",
    root.p=setNames(c(0,0,1),states),maximum=TRUE)
fit.c
## $maximum
## [1] 1.251644
## 
## $objective
## [1] -79.05971

For each fitted model, $maximum is the single fitted model parameter; whereas $objective is the log-likelihood. If we had a model with more than one parameter, we could use optim or some other numerical optimizer to fit the model.

Finally, don't forget that phytools has S3 plotting methods for both fitMk & fitDiscrete. E.g.:

plot(fit.phytools<-fitMk(tree,x,model="ARD"))
title(main="Result from phytools::fitMk(...,model=\"ARD\")")
title(main=paste("\n\n\nlog(L) =",round(logLik(fit.phytools),5)),
    cex.main=1)

plot of chunk unnamed-chunk-5

plot(fit.geiger<-fitDiscrete(tree,x,model="ARD"))
title(main="Result from geiger::fitDiscrete(...,model=\"ARD\")")
title(main=paste("\n\n\nlog(L) =",round(logLik(fit.geiger),5)),
    cex.main=1)

plot of chunk unnamed-chunk-5

(Note that these two fitted models, and their likelihoods, come out as different - but in reality the likelihoods are not directly comparable because the two functions make different assumptions about the prior on the root. If we force geiger's fitDiscrete likelihood function to use a flat prior, we should find that it yields a worse likelihood then we obtained using phytools:::fitMk - if, in fact, fitMk has converged on the MLE, e.g.:

lik<-fit.geiger$lik
q<-fit.phytools$rates
## fitMk paramater estimates
lik(pars=c(q12=q[3],q13=q[5],q21=q[1],q23=q[6],q31=q[2],q32=q[4]),
    root="flat")
## [1] -78.69924
## fitDiscrete estimates, but with flat root prior
lik(pars=unlist(fit.geiger$opt[1:6]),root="flat")
## [1] -78.77036

It would also have been plausible that fitMk had failed to converge on the ML solution as, in general, the optimization routines used by fitDiscrete are more robust. This doesn't seem to the case in this instance.)

The data for this exercise were simulated as follows:

tree<-pbtree(n=120,scale=1)
Q<-matrix(c(-1,1,0,1,-2,1,0,1,-1),3,3,dimnames=list(letters[1:3],
    letters[1:3]))
x<-as.factor(sim.history(tree,Q,anc="b")$states)

That's it.

1 comment:

  1. Dear Liam, many thanks for this very enlightening post!
    While trying to perform some reconstructions, I came accross an unexpected result. Iam trying to test different scenarios for character evolution (equal rates, all-rates-different, and customized matrices with irreversible states).

    Specifically, I have a character that has a defined ancestral condition (established by outgroups not present in the analysis), three different derived conditions, and taxa which lack this particular structure. A small group of taxa which is deeply nested in the tree presents the plesiomorphic condition, and I would like to test if this is a plesiomorphic retention or a re-gain. In the all-rates-different model, it is optimized as a re-gain. So I wanted to compare this with a model where re-gains are impossible, using this matrix:
    - "ancestral" can shift to any state
    -"derived" states can shift to "absent" only, but not to other derived states, nor reverse to the "ancestral" condition
    -"absent" cannot shift to any condition

    This optimizes the character as ancestral. Everything seems fine so far!

    However, the exact same tree is produced if a use a different matrix where all transitions are possible except for re-gains from "absent" structures (i.e.the model allows for reversals from the derived states to the ancestral state, and also transitions among derived states). I find this very strange, because the only difference between this and the ARD model is the impossibility to reverse from "absent" to other states - but neither of the trees suggested such an optimization would occur. Thus, I would expect that this second customized matrix would give results identical to those of ARD -- do you know why such a result should to be expected?

    ReplyDelete

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.