Wednesday, September 27, 2017

New bug for ordered, directional character evolution model in fitMk

I just discovered what seems to be a new bug in the phytools function fitMk for fitting the Mk model of discrete trait evolution. fitMk uses code modified from ape::ace, but has been adapted to do a variety of different things - such as accept ambiguous tip states (coded as prior probabilities in a matrix), and a flexible prior density at the global root of the tree. I believe I've tracked the bug down to the internally used function matexpo for matrix exponentiation from the ape package. When I substitute phytools::expm (which actually just wraps msm::MatrixExp the function works again.

I've only gotten the bug to appear for ordered, directional models. That is, for instance, a model in which we permit A→B→C, but not the reverse.

Here's an example using a dataset of squamates & digit number from a study by Matt Brandley et al. (2008). (Actually, the data are simplified because in the original data some species are variable in digit number for a particular limb, which is a nuance I've ignored.)

It might make sense to fit a model in which digit loss occurred 5→4→3→2→1→0, without jumps between non-adjacent digits numbers & without digit readquisition.* (I'm not proposing to imagine that neither is possible, let's just suppose we want to fit this seemingly reasonable model.)

We can do that as follows:

library(phytools)
library(geiger)
sqTree
## 
## Phylogenetic tree with 258 tips and 257 internal nodes.
## 
## Tip labels:
##  Abronia_graminea, Mesaspis_moreleti, Gerrhonotus_liocephalus, Barisia_imbricata, Elgaria_coerulea, Elgaria_kingii, ...
## 
## Rooted; includes branch lengths.
head(toes,n=30)
##     Agamodon_anguliceps        Amphisbaena_alba           Bipes_biporus 
##                       0                       0                       0 
##       Bipes_caniculatus       Bipes_tridactylus         Blanus_cinereus 
##                       0                       0                       0 
##   Chirindia_swimmertoni   Diplometopon_zarudnyi       Geocalamus_acutus 
##                       0                       0                       0 
##     Monopeltis_capensis      Rhineura_floridana   Trogonophis_wiegmanni 
##                       0                       0                       0 
##        Abronia_graminea         Anguis_fragilis   Anniella_geronimensis 
##                       5                       0                       0 
##        Anniella_pulchra       Barisia_imbricata   Celestus_enneagrammus 
##                       0                       5                       5 
##  Diploglossus_bilobatus      Diploglossus_pleei        Elgaria_coerulea 
##                       5                       5                       5 
##          Elgaria_kingii   Elgaria_multicarinata     Elgaria_panamintina 
##                       5                       5                       5 
##   Elgaria_paucicarinata Gerrhonotus_liocephalus       Mesaspis_moreleti 
##                       5                       5                       5 
##       Ophiodes_striatus       Ophisaurus_apodus   Ophisaurus_attenuatus 
##                       1                       1                       0 
## Levels: 0 1 2 3 4 5

I realize that this isn't the best way to visualize our data - but nevertheless:

plotTree.barplot(sqTree,
    setNames(as.numeric(toes)-1,names(toes)),
    args.plotTree=list(fsize=0.3),args.barplot=list(space=0,
    xlab="number of digits in the hindlimb",border="grey"))

plot of chunk unnamed-chunk-2

Now we can set up & fit our model:

directional<-matrix(c(0,0,0,0,0,0,
    1,0,0,0,0,0,
    0,2,0,0,0,0,
    0,0,3,0,0,0,
    0,0,0,4,0,0,
    0,0,0,0,5,0),6,6,byrow=TRUE,
    dimnames=list(levels(toes),levels(toes)))
directional
##   0 1 2 3 4 5
## 0 0 0 0 0 0 0
## 1 1 0 0 0 0 0
## 2 0 2 0 0 0 0
## 3 0 0 3 0 0 0
## 4 0 0 0 4 0 0
## 5 0 0 0 0 5 0
fitDirectional<-fitDiscrete(sqTree,toes,model=directional)
## Warning in treedata(phy, dat): The following tips were not found in 'phy' and were dropped from 'data':
##  Gonatodes_albogularis
##  Lepidophyma_flavimaculatum
##  Trachyboa_boulengeri
## Warning in fitDiscrete(sqTree, toes, model = directional): Parameter estimates appear at bounds:
##  q12
##  q13
##  q14
##  q15
##  q16
##  q23
##  q24
##  q25
##  q26
##  q31
##  q34
##  q35
##  q36
##  q41
##  q42
##  q45
##  q46
##  q51
##  q52
##  q53
##  q56
##  q61
##  q62
##  q63
##  q64
fitDirectional
## GEIGER-fitted comparative model of discrete data
##  fitted Q matrix:
##                0           1          2          3            4
##     0 0.00000000  0.00000000  0.0000000  0.0000000  0.000000000
##     1 0.02314697 -0.02314697  0.0000000  0.0000000  0.000000000
##     2 0.00000000  0.17375162 -0.1737516  0.0000000  0.000000000
##     3 0.00000000  0.00000000  0.2123601 -0.2123601  0.000000000
##     4 0.00000000  0.00000000  0.0000000  0.1118374 -0.111837366
##     5 0.00000000  0.00000000  0.0000000  0.0000000  0.004129334
##                  5
##     0  0.000000000
##     1  0.000000000
##     2  0.000000000
##     3  0.000000000
##     4  0.000000000
##     5 -0.004129334
## 
##  model summary:
##  log-likelihood = -211.796485
##  AIC = 433.592970
##  AICc = 433.831065
##  free parameters = 5
## 
## Convergence diagnostics:
##  optimization iterations = 100
##  failed iterations = 0
##  frequency of best fit = 0.06
## 
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates

We have also been able to fit this model using fitMk, but this appears to be broken:

fit.phytools<-fitMk(sqTree,toes,model=directional)
## Warning in log(comp[1:M + N]): NaNs produced

## Warning in log(comp[1:M + N]): NaNs produced

## Warning in log(comp[1:M + N]): NaNs produced

## Warning in log(comp[1:M + N]): NaNs produced

Let's load the fix and test it as follows:

source("https://raw.githubusercontent.com/liamrevell/phytools/master/R/fitMk.R")
fit.phytools<-fitMk(sqTree,toes,model=directional)
fit.phytools
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##          0         1         2         3         4         5
## 0 0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
## 1 0.023147 -0.023147  0.000000  0.000000  0.000000  0.000000
## 2 0.000000  0.173751 -0.173751  0.000000  0.000000  0.000000
## 3 0.000000  0.000000  0.212357 -0.212357  0.000000  0.000000
## 4 0.000000  0.000000  0.000000  0.111838 -0.111838  0.000000
## 5 0.000000  0.000000  0.000000  0.000000  0.004129 -0.004129
## 
## Fitted (or set) value of pi:
##         0         1         2         3         4         5 
## 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 
## 
## Log-likelihood: -213.588245 
## 
## Optimization method used was "nlminb"

Note that though very similar, fitMk and fitDiscrete are slightly different because they use different assumptions about the prior at the root of the tree. Nonetheless, it seems that fitMk is fixed and working again.

The bug will also affect other functions that use fitMk internally, such as make.simmap, but it can be fixed by updating phytools from GitHub.

3 comments:

  1. Dear Liam,
    I am recieving the same error message when using fitMk to fit an ordered directional model, but only when using pi="estimated" as my root prior. Do you have any idea why this might be?
    Thank you!
    Claire

    ReplyDelete
    Replies
    1. Hi Claire. If you are just seeing the error message "Warning in log(comp[1:M + N]): NaNs produced" but your optimization otherwise works, this is probably not a problem. If the optimization fails, that may be a different story. Can I ask why you're using pi="estimated"? This option has fitMk find the stationary distribution & uses that as the root prior; however, this is not likely to be an appropriate prior unless your model is symmetric. -- Liam

      Delete
    2. Hi Liam, thank you for your reply. I was using pi="estimated" purely out of ignorance - I wasn't sure which posterior was best so was trying them all out to see how they affected my parameter estimations.😅 I appreciate the clarification!

      Delete

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.