Saturday, September 19, 2015

The difference between different methods for fitting the Mk model in R: It's all about the prior

There are multiple functions in R to fit the so-called Mk model for discrete character evolution. For instance, there is ace in the ape package, fitDiscrete in geiger, and now fitMk in my package, phytools - which has a lot of similarity to ace, but some important differences. This does not even consider the functions of phangorn and diversitree, which can also be used to fit this model.

A casual user may discovery, however, that although these methods all purport to be fitting the same model, parameter estimates and likelihoods often differ ever so slightly (but more than one would expect based on the limits of numerical precision) between methods.

What accounts for this, it turns out, is different implicit or explicit assumptions about the prior probability distribution at the global root of the tree. This is difficult to track down in documentation for any of these functions, although in fitMk the assumed prior (pi) is reported explicitly (and can be modified), and (as I show below) it is also possible to use fitDiscrete to fit the Mk model with any prior.

First, load packages

## load packages
library(phytools)
library(geiger)

Next, let's simulate some data under the Mk model:

## simulate some data
tree<-pbtree(n=100,scale=1)
Q<-matrix(c(-2,1,1,1,-2,1,1,1,-2),3,3)
rownames(Q)<-colnames(Q)<-letters[1:3]
Q
##    a  b  c
## a -2  1  1
## b  1 -2  1
## c  1  1 -2
x<-sim.history(tree,Q,anc="a")$states
## Done simulation(s).
x
##  t53  t54  t89  t90   t9  t10   t6  t11  t24  t25  t12  t44  t62  t67  t74 
##  "a"  "a"  "c"  "c"  "a"  "a"  "b"  "b"  "a"  "a"  "a"  "a"  "a"  "b"  "a" 
##  t75  t65  t66  t45  t81  t82  t69  t35  t36  t70  t71   t1   t5  t17  t23 
##  "a"  "c"  "c"  "a"  "a"  "c"  "a"  "a"  "a"  "b"  "c"  "c"  "a"  "a"  "a" 
##  t91  t92  t55  t56  t46  t47  t50  t51  t22  t78  t85  t86  t61  t63  t64 
##  "c"  "b"  "c"  "c"  "c"  "c"  "c"  "c"  "c"  "c"  "c"  "c"  "c"  "c"  "c" 
##  t18  t95  t96  t13  t42  t43  t93  t94  t99 t100  t15   t4  t40  t41   t2 
##  "c"  "b"  "b"  "c"  "c"  "c"  "c"  "c"  "a"  "a"  "b"  "c"  "b"  "b"  "b" 
##   t3  t76  t77  t39  t38  t30  t27  t31  t83  t84  t57  t58  t37  t14  t19 
##  "b"  "b"  "a"  "a"  "a"  "b"  "a"  "a"  "a"  "a"  "a"  "a"  "c"  "a"  "a" 
##  t20   t7  t79  t80  t21   t8  t68  t87  t88  t52  t72  t73  t33  t34  t16 
##  "a"  "c"  "b"  "b"  "b"  "a"  "a"  "a"  "a"  "a"  "b"  "b"  "b"  "b"  "b" 
##  t26  t28  t29  t48  t49  t97  t98  t32  t59  t60 
##  "b"  "c"  "a"  "a"  "a"  "b"  "b"  "b"  "a"  "b"

Now, we can fit each model under the generating model (model="ER"):

## fit models
fit.phytools<-fitMk(tree,x,model="ER",pi="equal") ## flat root
fit.phytools
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##           a         b         c
## a -1.773072  0.886536  0.886536
## b  0.886536 -1.773072  0.886536
## c  0.886536  0.886536 -1.773072
## 
## Fitted (or set) value of pi:
##         a         b         c 
## 0.3333333 0.3333333 0.3333333 
## 
## Log-likelihood: -86.449618
fit.ape<-ace(x,tree,type="discrete",model="ER")
fit.ape
## 
##     Ancestral Character Estimation
## 
## Call: ace(x = x, phy = tree, type = "discrete", model = "ER")
## 
##     Log-likelihood: -85.35101 
## 
## Rate index matrix:
##   a b c
## a . 1 1
## b 1 . 1
## c 1 1 .
## 
## Parameter estimates:
##  rate index estimate std-err
##           1   0.8865  0.1524
## 
## Scaled likelihoods at the root (type '...$lik.anc' to get them for all nodes):
##         a         b         c 
## 0.4114971 0.3071304 0.2813725
fit.geiger<-fitDiscrete(tree,x,model="ER")
fit.geiger
## GEIGER-fitted comparative model of discrete data
##  fitted Q matrix:
##                a          b          c
##     a -1.7668479  0.8834239  0.8834239
##     b  0.8834239 -1.7668479  0.8834239
##     c  0.8834239  0.8834239 -1.7668479
## 
##  model summary:
##  log-likelihood = -86.421424
##  AIC = 174.842847
##  AICc = 174.883663
##  free parameters = 1
## 
## Convergence diagnostics:
##  optimization iterations = 100
##  failed iterations = 0
##  frequency of best fit = 1.00
## 
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates

We can see that each is slightly different - either in the fitted model, or the likelihood, or both!

To show that this is due to different priors on the root node, we can change the prior using the object created by fitDiscrete. This is not super-straightforward, but kind of neat in the end. (There may be some way to do this within a call of fitDiscrete, but I was not able to do that.)

## the likelihood function
lik<-fit.geiger$lik
## optimize it using a flat/"equal" prior
fit.flat<-optimize(lik,c(1e-8,1000),root=ROOT.FLAT,maximum=TRUE)
fit.flat
## $maximum
## [1] 0.8865379
## 
## $objective
## [1] -86.44962

If we compare this rate parameter to fitMk or ace, or the likelihood to fitMk, we should see that it is equal. Cool.

That's all I have to say about this for now.

1 comment:

  1. Great post Liam,

    Another option out their is the way that diversitree handles this by default. Where the likelihoods at the root are weighted in proportion to how well they explain the observed tip data. Rich does a nice job of explaining the approach in the first appendix of Fitzjohn, Maddison and Otto 2009. Also in diversitree you can choose just to return the multiple probabilites calculated at the root rather than summing over them. This can be nice at least for exploration so you can see how or to what extent the choice of root prior is going to effect your result

    cheers

    ReplyDelete