Thursday, March 30, 2023

Slightly faster calculation of the likelihood in fitMk for fitting the extended Mk model of discrete character evolution in phytools

I just added a new option in fitMk (for fitting the extended Mk model of discrete character evolution on a phylogeny) to use a different internal function to compute the likelihood.

Even though it seems (pretty clearly) to be faster, I haven’t updated the default yet just in case there are any bugs that I haven’t identified or anticipated.

This isn’t a new method or new optimization routine – just a different implementation of exactly the same calculation that fitMk was doing before.

We can adjust which likelihood function to use via the hidden function argument lik.func. If lik.func is set to lik.func="lik" (the default), then fitMk will use the previous method. If lik.func="pruning", then fitMk will use the new likelihood calculation implementation. (Both likelihood calculations use the pruning algorithm of Felsenstein! This is just an indication of the two names of the different internally used functions.)

Let’s see a quick demo that shows us the difference.

I’ll analyze primate diel activity pattern evolution on a phylogeny using data from Kirk & Kay (2004).

Start by loading phytools plus our data & tree.

library(phytools)
data(primate.tree)
data(primate.data)
diel_pattern<-setNames(
  as.factor(primate.data$Activity_pattern),
  rownames(primate.data))

We’ll use system.time to measure the time that each option takes to run. Remember – this is not a change in the optimization routine, and we’ll use the same starting values for optimization, so any difference in time should largely be due to a difference in how long it takes to compute the likelihood.

First with the default function.

system.time(
  ard_diel.1<-fitMk(primate.tree,diel_pattern,
    model="ARD",pi="fitzjohn")
)
##    user  system elapsed 
##    4.42    0.08   13.88
print(ard_diel.1)
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##            Cathemeral   Diurnal Nocturnal
## Cathemeral    0.00000  0.000000  0.000000
## Diurnal       0.00416 -0.006031  0.001871
## Nocturnal     0.00000  0.006281 -0.006281
## 
## Fitted (or set) value of pi:
## Cathemeral    Diurnal  Nocturnal 
##   0.000000   0.013462   0.986538 
## due to treating the root prior as (a) nuisance.
## 
## Log-likelihood: -30.691752 
## 
## Optimization method used was "nlminb"

Next, with lik.func="pruning".

system.time(
  ard_diel.2<-fitMk(primate.tree,diel_pattern,
    model="ARD",pi="fitzjohn",lik.func="pruning")
)
##    user  system elapsed 
##    2.56    0.03    8.36
print(ard_diel.2)
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##            Cathemeral   Diurnal Nocturnal
## Cathemeral    0.00000  0.000000  0.000000
## Diurnal       0.00416 -0.006031  0.001871
## Nocturnal     0.00000  0.006281 -0.006281
## 
## Fitted (or set) value of pi:
## Cathemeral    Diurnal  Nocturnal 
##   0.000000   0.013463   0.986537 
## due to treating the root prior as (a) nuisance.
## 
## Log-likelihood: -30.691752 
## 
## Optimization method used was "nlminb"

Note that I make no assertions about this being fast in any kind of objective way – and (in fact), I’m certain that other R packages are better optimized to compute the likelihood under an Mk model than phytools! I just thought it was interesting that I managed to see such a performance difference out of something that I programmed initially as a demo.

Let’s plot ancestral states and our fitted model.

plotTree(primate.tree,type="fan",lwd=1,ftype="i",
  fsize=0.7,part=0.7,offset=3)
cols<-setNames(c("#DF536B","#DAA520","black"),
  levels(diel_pattern))
nodelabels(pie=ancr(ard_diel.2)$ace,piecol=cols,cex=0.4)
tiplabels(pie=ard_diel.1$data[primate.tree$tip.label,],
  piecol=cols,cex=0.3)
legend("topleft",names(cols),pch=21,pt.bg=cols,pt.cex=1.5)
par(cex=0.8)
plot(ard_diel.2,width=TRUE,add=TRUE,show.zeros=FALSE,
  text=FALSE,color=TRUE,xlim=c(-4,0.9),ylim=c(-1,3.5),
  signif=1,cex.traits=1.2)

plot of chunk unnamed-chunk-9

That’s it!