Monday, April 10, 2023

Log-likelihoods computed by ape::ace and phytools::fitMk differ by a constant amount

A phytools user recently brought it to my attention that the likelihoods computed by ape::ace and phytools::fitMk differ by a constant amount.

Here’s what I mean.

## load packages
library(ape)
library(phytools)
## load data for example
data(primate.tree)
tree<-primate.tree
data(primate.data)
x<-setNames(
  as.factor(primate.data$Activity_pattern),
  rownames(primate.data))
plotTree(primate.tree,ftype="i",fsize=0.6,
  type="fan",part=0.5,offset=2,lwd=1)
pp<-get("last_plot.phylo",envir=.PlotPhyloEnv)
cols<-setNames(c("#21908CFF","#FDE725FF","#440154FF"),
  levels(x))
COLS<-cols[x[tree$tip.labe]]
points(pp$xx[1:Ntip(tree)],pp$yy[1:Ntip(tree)],
  pch=16,col=COLS)
legend(0.9*par()$usr[1],0.9*par()$usr[4],
  levels(x),pch=16,col=cols,bty="n",pt.cex=1.5,
  cex=0.8)

plot of chunk unnamed-chunk-2

First fit an ER model using ape::ace.

## fit ER Mk model with ace
er_ace<-ace(x,tree,model="ER",type="discrete")
er_ace
## 
##     Ancestral Character Estimation
## 
## Call: ace(x = x, phy = tree, type = "discrete", model = "ER")
## 
##     Log-likelihood: -33.14864 
## 
## Rate index matrix:
##            Cathemeral Diurnal Nocturnal
## Cathemeral          .       1         1
## Diurnal             1       .         1
## Nocturnal           1       1         .
## 
## Parameter estimates:
##  rate index estimate std-err
##           1   0.0032   9e-04
## 
## Scaled likelihoods at the root (type '...$lik.anc' to get them for all nodes):
##  Cathemeral     Diurnal   Nocturnal 
## 0.008107298 0.134008707 0.857883995

If I fit the same model using fitMk I get the same parameter estimate, but a different log-likelihood.

## fit ER Mk model with fitMk
er_fitMk<-fitMk(tree,x,model="ER")
er_fitMk
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##            Cathemeral   Diurnal Nocturnal
## Cathemeral  -0.006490  0.003245  0.003245
## Diurnal      0.003245 -0.006490  0.003245
## Nocturnal    0.003245  0.003245 -0.006490
## 
## Fitted (or set) value of pi:
## Cathemeral    Diurnal  Nocturnal 
##   0.333333   0.333333   0.333333 
## due to treating the root prior as (a) flat.
## 
## Log-likelihood: -34.247253 
## 
## Optimization method used was "nlminb"

This isn’t proof in and of itself that the likelihood computed by fitMk is correct, but corHMM::corHMM computes the same value as fitMk.

## fit ER Mk model with corHMM
library(corHMM)
dat<-data.frame(Genus_sp=names(x),x=as.numeric(x))
er_corhmm<-corHMM(tree,dat,model="ER",rate.cat=1,root.p=rep(1/3,3))
## State distribution in data:
## States:	1	2	3	
## Counts:	8	54	28	
## Beginning thorough optimization search -- performing 0 random restarts 
## Finished. Inferring ancestral states using marginal reconstruction.
er_corhmm
## 
## Fit
##       -lnL      AIC     AICc Rate.cat ntax
##  -34.24725 70.49451 70.53996        1   90
## 
## Legend
##   1   2   3 
## "1" "2" "3" 
## 
## Rates
##             (1,R1)      (2,R1)      (3,R1)
## (1,R1)          NA 0.003245043 0.003245043
## (2,R1) 0.003245043          NA 0.003245043
## (3,R1) 0.003245043 0.003245043          NA
## 
## Arrived at a reliable solution

The same general pattern is true with the SYM model.

## fit SYM Mk model with ace
sym_ace<-ace(x,tree,model="SYM",type="discrete")
sym_ace
## 
##     Ancestral Character Estimation
## 
## Call: ace(x = x, phy = tree, type = "discrete", model = "SYM")
## 
##     Log-likelihood: -31.20284 
## 
## Rate index matrix:
##            Cathemeral Diurnal Nocturnal
## Cathemeral          .       1         2
## Diurnal             1       .         3
## Nocturnal           2       3         .
## 
## Parameter estimates:
##  rate index estimate std-err
##           1   0.0041  0.0029
##           2   0.0000  0.0051
##           3   0.0041  0.0012
## 
## Scaled likelihoods at the root (type '...$lik.anc' to get them for all nodes):
##   Cathemeral      Diurnal    Nocturnal 
## 0.0001116237 0.1201846876 0.8797036887
## fit Mk model with fitMk
sym_fitMk<-fitMk(tree,x,model="SYM")
sym_fitMk
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##            Cathemeral   Diurnal Nocturnal
## Cathemeral  -0.004113  0.004113  0.000000
## Diurnal      0.004113 -0.008244  0.004131
## Nocturnal    0.000000  0.004131 -0.004131
## 
## Fitted (or set) value of pi:
## Cathemeral    Diurnal  Nocturnal 
##   0.333333   0.333333   0.333333 
## due to treating the root prior as (a) flat.
## 
## Log-likelihood: -32.301456 
## 
## Optimization method used was "nlminb"
## fit Mk model with corHMM
dat<-data.frame(Genus_sp=names(x),x=as.numeric(x))
sym_corhmm<-corHMM(tree,dat,model="SYM",rate.cat=1,root.p=rep(1/3,3))
## State distribution in data:
## States:	1	2	3	
## Counts:	8	54	28	
## Beginning thorough optimizatio