A phytools user recently brought it to my attention that the likelihoods computed by ape::ace
and phytools::fitMk
differ by a constant amount.
Here’s what I mean.
## load packages
library(ape)
library(phytools)
## load data for example
data(primate.tree)
tree<-primate.tree
data(primate.data)
x<-setNames(
as.factor(primate.data$Activity_pattern),
rownames(primate.data))
plotTree(primate.tree,ftype="i",fsize=0.6,
type="fan",part=0.5,offset=2,lwd=1)
pp<-get("last_plot.phylo",envir=.PlotPhyloEnv)
cols<-setNames(c("#21908CFF","#FDE725FF","#440154FF"),
levels(x))
COLS<-cols[x[tree$tip.labe]]
points(pp$xx[1:Ntip(tree)],pp$yy[1:Ntip(tree)],
pch=16,col=COLS)
legend(0.9*par()$usr[1],0.9*par()$usr[4],
levels(x),pch=16,col=cols,bty="n",pt.cex=1.5,
cex=0.8)
First fit an ER model using ape::ace
.
## fit ER Mk model with ace
er_ace<-ace(x,tree,model="ER",type="discrete")
er_ace
##
## Ancestral Character Estimation
##
## Call: ace(x = x, phy = tree, type = "discrete", model = "ER")
##
## Log-likelihood: -33.14864
##
## Rate index matrix:
## Cathemeral Diurnal Nocturnal
## Cathemeral . 1 1
## Diurnal 1 . 1
## Nocturnal 1 1 .
##
## Parameter estimates:
## rate index estimate std-err
## 1 0.0032 9e-04
##
## Scaled likelihoods at the root (type '...$lik.anc' to get them for all nodes):
## Cathemeral Diurnal Nocturnal
## 0.008107298 0.134008707 0.857883995
If I fit the same model using fitMk
I get the same parameter estimate, but a different log-likelihood.
## fit ER Mk model with fitMk
er_fitMk<-fitMk(tree,x,model="ER")
er_fitMk
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## Cathemeral Diurnal Nocturnal
## Cathemeral -0.006490 0.003245 0.003245
## Diurnal 0.003245 -0.006490 0.003245
## Nocturnal 0.003245 0.003245 -0.006490
##
## Fitted (or set) value of pi:
## Cathemeral Diurnal Nocturnal
## 0.333333 0.333333 0.333333
## due to treating the root prior as (a) flat.
##
## Log-likelihood: -34.247253
##
## Optimization method used was "nlminb"
This isn’t proof in and of itself that the likelihood computed by fitMk
is correct, but corHMM::corHMM
computes the same value as fitMk
.
## fit ER Mk model with corHMM
library(corHMM)
dat<-data.frame(Genus_sp=names(x),x=as.numeric(x))
er_corhmm<-corHMM(tree,dat,model="ER",rate.cat=1,root.p=rep(1/3,3))
## State distribution in data:
## States: 1 2 3
## Counts: 8 54 28
## Beginning thorough optimization search -- performing 0 random restarts
## Finished. Inferring ancestral states using marginal reconstruction.
er_corhmm
##
## Fit
## -lnL AIC AICc Rate.cat ntax
## -34.24725 70.49451 70.53996 1 90
##
## Legend
## 1 2 3
## "1" "2" "3"
##
## Rates
## (1,R1) (2,R1) (3,R1)
## (1,R1) NA 0.003245043 0.003245043
## (2,R1) 0.003245043 NA 0.003245043
## (3,R1) 0.003245043 0.003245043 NA
##
## Arrived at a reliable solution
The same general pattern is true with the SYM model.
## fit SYM Mk model with ace
sym_ace<-ace(x,tree,model="SYM",type="discrete")
sym_ace
##
## Ancestral Character Estimation
##
## Call: ace(x = x, phy = tree, type = "discrete", model = "SYM")
##
## Log-likelihood: -31.20284
##
## Rate index matrix:
## Cathemeral Diurnal Nocturnal
## Cathemeral . 1 2
## Diurnal 1 . 3
## Nocturnal 2 3 .
##
## Parameter estimates:
## rate index estimate std-err
## 1 0.0041 0.0029
## 2 0.0000 0.0051
## 3 0.0041 0.0012
##
## Scaled likelihoods at the root (type '...$lik.anc' to get them for all nodes):
## Cathemeral Diurnal Nocturnal
## 0.0001116237 0.1201846876 0.8797036887
## fit Mk model with fitMk
sym_fitMk<-fitMk(tree,x,model="SYM")
sym_fitMk
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## Cathemeral Diurnal Nocturnal
## Cathemeral -0.004113 0.004113 0.000000
## Diurnal 0.004113 -0.008244 0.004131
## Nocturnal 0.000000 0.004131 -0.004131
##
## Fitted (or set) value of pi:
## Cathemeral Diurnal Nocturnal
## 0.333333 0.333333 0.333333
## due to treating the root prior as (a) flat.
##
## Log-likelihood: -32.301456
##
## Optimization method used was "nlminb"
## fit Mk model with corHMM
dat<-data.frame(Genus_sp=names(x),x=as.numeric(x))
sym_corhmm<-corHMM(tree,dat,model="SYM",rate.cat=1,root.p=rep(1/3,3))
## State distribution in data:
## States: 1 2 3
## Counts: 8 54 28
## Beginning thorough optimization search -- performing 0 random restarts
## Finished. Inferring ancestral states using marginal reconstruction.
sym_corhmm
##
## Fit
## -lnL AIC AICc Rate.cat ntax
## -32.30146 70.60291 70.88198 1 90
##
## Legend
## 1 2 3
## "1" "2" "3"
##
## Rates
## (1,R1) (2,R1) (3,R1)
## (1,R1) NA 0.004112635 0.000000001
## (2,R1) 0.004112635 NA 0.004130964
## (3,R1) 0.000000001 0.004130964 NA
##
## Arrived at a reliable solution
In fact, both SYM and ER models differ between ace
and fitMk
by a constant.
## ace vs fitMk differ by a constant
logLik(er_ace) - logLik(er_fitMk)
## [1] 1.098612
## attr(,"df")
## [1] 1
logLik(sym_ace) - logLik(sym_fitMk)
## [1] 1.098612
## attr(,"df")
## [1] 3
What’s the meaning of this constant? Well, it happens to be equal to log(k), for k states!
log(3)
## [1] 1.098612
(This makes sense if instead of multiplying the marginal likelihoods at the root node by the prior and summing them, ace
is just summing them.)
Lastly, as readers of this blog will know, phytools now does marginal ancestral state reconstruction!
## lastly, get the marginal ancestral states using phytools::ancr
marginal_states<-ancr(sym_fitMk)
marginal_states
## Marginal ancestral state estimates:
## Cathemeral Diurnal Nocturnal
## 91 0.000112 0.120184 0.879704
## 92 0.000072 0.146990 0.852938
## 93 0.000383 0.930662 0.068955
## 94 0.000122 0.998106 0.001772
## 95 0.000027 0.999913 0.000060
## 96 0.000012 0.999976 0.000012
## ...
##
## Log-likelihood = -32.301456
plotTree(primate.tree,ftype="i",fsize=0.6,
type="fan",part=0.5,offset=2,lwd=1)
pp<-get("last_plot.phylo",envir=.PlotPhyloEnv)
cols<-setNames(c("#21908CFF","#FDE725FF","#440154FF"),
levels(x))
COLS<-cols[x[tree$tip.labe]]
points(pp$xx[1:Ntip(tree)],pp$yy[1:Ntip(tree)],
pch=16,col=COLS)
legend(0.9*par()$usr[1],0.9*par()$usr[4],
levels(x),pch=16,col=cols,bty="n",pt.cex=1.5,
cex=0.8)
nodelabels(pie=marginal_states$ace,cex=0.3,piecol=cols)
No comments:
Post a Comment
Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.