A phytools user recently brought it to my attention that the likelihoods computed by ape::ace
and phytools::fitMk
differ by a constant amount.
Here’s what I mean.
## load packages
library(ape)
library(phytools)
## load data for example
data(primate.tree)
tree<-primate.tree
data(primate.data)
x<-setNames(
as.factor(primate.data$Activity_pattern),
rownames(primate.data))
plotTree(primate.tree,ftype="i",fsize=0.6,
type="fan",part=0.5,offset=2,lwd=1)
pp<-get("last_plot.phylo",envir=.PlotPhyloEnv)
cols<-setNames(c("#21908CFF","#FDE725FF","#440154FF"),
levels(x))
COLS<-cols[x[tree$tip.labe]]
points(pp$xx[1:Ntip(tree)],pp$yy[1:Ntip(tree)],
pch=16,col=COLS)
legend(0.9*par()$usr[1],0.9*par()$usr[4],
levels(x),pch=16,col=cols,bty="n",pt.cex=1.5,
cex=0.8)
First fit an ER model using ape::ace
.
## fit ER Mk model with ace
er_ace<-ace(x,tree,model="ER",type="discrete")
er_ace
##
## Ancestral Character Estimation
##
## Call: ace(x = x, phy = tree, type = "discrete", model = "ER")
##
## Log-likelihood: -33.14864
##
## Rate index matrix:
## Cathemeral Diurnal Nocturnal
## Cathemeral . 1 1
## Diurnal 1 . 1
## Nocturnal 1 1 .
##
## Parameter estimates:
## rate index estimate std-err
## 1 0.0032 9e-04
##
## Scaled likelihoods at the root (type '...$lik.anc' to get them for all nodes):
## Cathemeral Diurnal Nocturnal
## 0.008107298 0.134008707 0.857883995
If I fit the same model using fitMk
I get the same parameter estimate, but a different log-likelihood.
## fit ER Mk model with fitMk
er_fitMk<-fitMk(tree,x,model="ER")
er_fitMk
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## Cathemeral Diurnal Nocturnal
## Cathemeral -0.006490 0.003245 0.003245
## Diurnal 0.003245 -0.006490 0.003245
## Nocturnal 0.003245 0.003245 -0.006490
##
## Fitted (or set) value of pi:
## Cathemeral Diurnal Nocturnal
## 0.333333 0.333333 0.333333
## due to treating the root prior as (a) flat.
##
## Log-likelihood: -34.247253
##
## Optimization method used was "nlminb"
This isn’t proof in and of itself that the likelihood computed by fitMk
is correct, but corHMM::corHMM
computes the same value as fitMk
.
## fit ER Mk model with corHMM
library(corHMM)
dat<-data.frame(Genus_sp=names(x),x=as.numeric(x))
er_corhmm<-corHMM(tree,dat,model="ER",rate.cat=1,root.p=rep(1/3,3))
## State distribution in data:
## States: 1 2 3
## Counts: 8 54 28
## Beginning thorough optimization search -- performing 0 random restarts
## Finished. Inferring ancestral states using marginal reconstruction.
er_corhmm
##
## Fit
## -lnL AIC AICc Rate.cat ntax
## -34.24725 70.49451 70.53996 1 90
##
## Legend
## 1 2 3
## "1" "2" "3"
##
## Rates
## (1,R1) (2,R1) (3,R1)
## (1,R1) NA 0.003245043 0.003245043
## (2,R1) 0.003245043 NA 0.003245043
## (3,R1) 0.003245043 0.003245043 NA
##
## Arrived at a reliable solution
The same general pattern is true with the SYM model.
## fit SYM Mk model with ace
sym_ace<-ace(x,tree,model="SYM",type="discrete")
sym_ace
##
## Ancestral Character Estimation
##
## Call: ace(x = x, phy = tree, type = "discrete", model = "SYM")
##
## Log-likelihood: -31.20284
##
## Rate index matrix:
## Cathemeral Diurnal Nocturnal
## Cathemeral . 1 2
## Diurnal 1 . 3
## Nocturnal 2 3 .
##
## Parameter estimates:
## rate index estimate std-err
## 1 0.0041 0.0029
## 2 0.0000 0.0051
## 3 0.0041 0.0012
##
## Scaled likelihoods at the root (type '...$lik.anc' to get them for all nodes):
## Cathemeral Diurnal Nocturnal
## 0.0001116237 0.1201846876 0.8797036887
## fit Mk model with fitMk
sym_fitMk<-fitMk(tree,x,model="SYM")
sym_fitMk
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## Cathemeral Diurnal Nocturnal
## Cathemeral -0.004113 0.004113 0.000000
## Diurnal 0.004113 -0.008244 0.004131
## Nocturnal 0.000000 0.004131 -0.004131
##
## Fitted (or set) value of pi:
## Cathemeral Diurnal Nocturnal
## 0.333333 0.333333 0.333333
## due to treating the root prior as (a) flat.
##
## Log-likelihood: -32.301456
##
## Optimization method used was "nlminb"
## fit Mk model with corHMM
dat<-data.frame(Genus_sp=names(x),x=as.numeric(x))
sym_corhmm<-corHMM(tree,dat,model="SYM",rate.cat=1,root.p=rep(1/3,3))
## State distribution in data:
## States: 1 2 3
## Counts: 8 54 28
## Beginning thorough optimizatio