Monday, April 10, 2023

Log-likelihoods computed by ape::ace and phytools::fitMk differ by a constant amount

A phytools user recently brought it to my attention that the likelihoods computed by ape::ace and phytools::fitMk differ by a constant amount.

Here’s what I mean.

## load packages
library(ape)
library(phytools)
## load data for example
data(primate.tree)
tree<-primate.tree
data(primate.data)
x<-setNames(
  as.factor(primate.data$Activity_pattern),
  rownames(primate.data))
plotTree(primate.tree,ftype="i",fsize=0.6,
  type="fan",part=0.5,offset=2,lwd=1)
pp<-get("last_plot.phylo",envir=.PlotPhyloEnv)
cols<-setNames(c("#21908CFF","#FDE725FF","#440154FF"),
  levels(x))
COLS<-cols[x[tree$tip.labe]]
points(pp$xx[1:Ntip(tree)],pp$yy[1:Ntip(tree)],
  pch=16,col=COLS)
legend(0.9*par()$usr[1],0.9*par()$usr[4],
  levels(x),pch=16,col=cols,bty="n",pt.cex=1.5,
  cex=0.8)

plot of chunk unnamed-chunk-2

First fit an ER model using ape::ace.

## fit ER Mk model with ace
er_ace<-ace(x,tree,model="ER",type="discrete")
er_ace
## 
##     Ancestral Character Estimation
## 
## Call: ace(x = x, phy = tree, type = "discrete", model = "ER")
## 
##     Log-likelihood: -33.14864 
## 
## Rate index matrix:
##            Cathemeral Diurnal Nocturnal
## Cathemeral          .       1         1
## Diurnal             1       .         1
## Nocturnal           1       1         .
## 
## Parameter estimates:
##  rate index estimate std-err
##           1   0.0032   9e-04
## 
## Scaled likelihoods at the root (type '...$lik.anc' to get them for all nodes):
##  Cathemeral     Diurnal   Nocturnal 
## 0.008107298 0.134008707 0.857883995

If I fit the same model using fitMk I get the same parameter estimate, but a different log-likelihood.

## fit ER Mk model with fitMk
er_fitMk<-fitMk(tree,x,model="ER")
er_fitMk
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##            Cathemeral   Diurnal Nocturnal
## Cathemeral  -0.006490  0.003245  0.003245
## Diurnal      0.003245 -0.006490  0.003245
## Nocturnal    0.003245  0.003245 -0.006490
## 
## Fitted (or set) value of pi:
## Cathemeral    Diurnal  Nocturnal 
##   0.333333   0.333333   0.333333 
## due to treating the root prior as (a) flat.
## 
## Log-likelihood: -34.247253 
## 
## Optimization method used was "nlminb"

This isn’t proof in and of itself that the likelihood computed by fitMk is correct, but corHMM::corHMM computes the same value as fitMk.

## fit ER Mk model with corHMM
library(corHMM)
dat<-data.frame(Genus_sp=names(x),x=as.numeric(x))
er_corhmm<-corHMM(tree,dat,model="ER",rate.cat=1,root.p=rep(1/3,3))
## State distribution in data:
## States:	1	2	3	
## Counts:	8	54	28	
## Beginning thorough optimization search -- performing 0 random restarts 
## Finished. Inferring ancestral states using marginal reconstruction.
er_corhmm
## 
## Fit
##       -lnL      AIC     AICc Rate.cat ntax
##  -34.24725 70.49451 70.53996        1   90
## 
## Legend
##   1   2   3 
## "1" "2" "3" 
## 
## Rates
##             (1,R1)      (2,R1)      (3,R1)
## (1,R1)          NA 0.003245043 0.003245043
## (2,R1) 0.003245043          NA 0.003245043
## (3,R1) 0.003245043 0.003245043          NA
## 
## Arrived at a reliable solution

The same general pattern is true with the SYM model.

## fit SYM Mk model with ace
sym_ace<-ace(x,tree,model="SYM",type="discrete")
sym_ace
## 
##     Ancestral Character Estimation
## 
## Call: ace(x = x, phy = tree, type = "discrete", model = "SYM")
## 
##     Log-likelihood: -31.20284 
## 
## Rate index matrix:
##            Cathemeral Diurnal Nocturnal
## Cathemeral          .       1         2
## Diurnal             1       .         3
## Nocturnal           2       3         .
## 
## Parameter estimates:
##  rate index estimate std-err
##           1   0.0041  0.0029
##           2   0.0000  0.0051
##           3   0.0041  0.0012
## 
## Scaled likelihoods at the root (type '...$lik.anc' to get them for all nodes):
##   Cathemeral      Diurnal    Nocturnal 
## 0.0001116237 0.1201846876 0.8797036887
## fit Mk model with fitMk
sym_fitMk<-fitMk(tree,x,model="SYM")
sym_fitMk
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##            Cathemeral   Diurnal Nocturnal
## Cathemeral  -0.004113  0.004113  0.000000
## Diurnal      0.004113 -0.008244  0.004131
## Nocturnal    0.000000  0.004131 -0.004131
## 
## Fitted (or set) value of pi:
## Cathemeral    Diurnal  Nocturnal 
##   0.333333   0.333333   0.333333 
## due to treating the root prior as (a) flat.
## 
## Log-likelihood: -32.301456 
## 
## Optimization method used was "nlminb"
## fit Mk model with corHMM
dat<-data.frame(Genus_sp=names(x),x=as.numeric(x))
sym_corhmm<-corHMM(tree,dat,model="SYM",rate.cat=1,root.p=rep(1/3,3))
## State distribution in data:
## States:	1	2	3	
## Counts:	8	54	28	
## Beginning thorough optimization search -- performing 0 random restarts 
## Finished. Inferring ancestral states using marginal reconstruction.
sym_corhmm
## 
## Fit
##       -lnL      AIC     AICc Rate.cat ntax
##  -32.30146 70.60291 70.88198        1   90
## 
## Legend
##   1   2   3 
## "1" "2" "3" 
## 
## Rates
##             (1,R1)      (2,R1)      (3,R1)
## (1,R1)          NA 0.004112635 0.000000001
## (2,R1) 0.004112635          NA 0.004130964
## (3,R1) 0.000000001 0.004130964          NA
## 
## Arrived at a reliable solution

In fact, both SYM and ER models differ between ace and fitMk by a constant.

## ace vs fitMk differ by a constant
logLik(er_ace) - logLik(er_fitMk)
## [1] 1.098612
## attr(,"df")
## [1] 1
logLik(sym_ace) - logLik(sym_fitMk)
## [1] 1.098612
## attr(,"df")
## [1] 3

What’s the meaning of this constant? Well, it happens to be equal to log(k), for k states!

log(3)
## [1] 1.098612

(This makes sense if instead of multiplying the marginal likelihoods at the root node by the prior and summing them, ace is just summing them.)

Lastly, as readers of this blog will know, phytools now does marginal ancestral state reconstruction!

## lastly, get the marginal ancestral states using phytools::ancr
marginal_states<-ancr(sym_fitMk)
marginal_states
## Marginal ancestral state estimates:
##    Cathemeral  Diurnal Nocturnal
## 91   0.000112 0.120184  0.879704
## 92   0.000072 0.146990  0.852938
## 93   0.000383 0.930662  0.068955
## 94   0.000122 0.998106  0.001772
## 95   0.000027 0.999913  0.000060
## 96   0.000012 0.999976  0.000012
## ...
## 
## Log-likelihood = -32.301456
plotTree(primate.tree,ftype="i",fsize=0.6,
  type="fan",part=0.5,offset=2,lwd=1)
pp<-get("last_plot.phylo",envir=.PlotPhyloEnv)
cols<-setNames(c("#21908CFF","#FDE725FF","#440154FF"),
  levels(x))
COLS<-cols[x[tree$tip.labe]]
points(pp$xx[1:Ntip(tree)],pp$yy[1:Ntip(tree)],
  pch=16,col=COLS)
legend(0.9*par()$usr[1],0.9*par()$usr[4],
  levels(x),pch=16,col=cols,bty="n",pt.cex=1.5,
  cex=0.8)
nodelabels(pie=marginal_states$ace,cex=0.3,piecol=cols)

plot of chunk unnamed-chunk-12

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.