I just discovered what seems to be a new bug in the phytools function fitMk
for fitting the Mk model of discrete trait evolution. fitMk
uses code
modified from ape::ace
, but has been adapted to do a variety of different
things - such as accept ambiguous tip states (coded as prior probabilities in a matrix),
and a flexible prior density at the global root of the tree. I believe I've tracked the
bug down to the internally used function matexpo
for matrix exponentiation
from the ape package. When I substitute phytools::expm
(which actually
just wraps msm::MatrixExp
the function works again.
I've only gotten the bug to appear for ordered, directional models. That is, for instance, a model in which we permit A→B→C, but not the reverse.
Here's an example using a dataset of squamates & digit number from a study by Matt Brandley et al. (2008). (Actually, the data are simplified because in the original data some species are variable in digit number for a particular limb, which is a nuance I've ignored.)
It might make sense to fit a model in which digit loss occurred 5→4→3→2→1→0, without jumps between non-adjacent digits numbers & without digit readquisition.* (I'm not proposing to imagine that neither is possible, let's just suppose we want to fit this seemingly reasonable model.)
We can do that as follows:
library(phytools)
library(geiger)
sqTree
##
## Phylogenetic tree with 258 tips and 257 internal nodes.
##
## Tip labels:
## Abronia_graminea, Mesaspis_moreleti, Gerrhonotus_liocephalus, Barisia_imbricata, Elgaria_coerulea, Elgaria_kingii, ...
##
## Rooted; includes branch lengths.
head(toes,n=30)
## Agamodon_anguliceps Amphisbaena_alba Bipes_biporus
## 0 0 0
## Bipes_caniculatus Bipes_tridactylus Blanus_cinereus
## 0 0 0
## Chirindia_swimmertoni Diplometopon_zarudnyi Geocalamus_acutus
## 0 0 0
## Monopeltis_capensis Rhineura_floridana Trogonophis_wiegmanni
## 0 0 0
## Abronia_graminea Anguis_fragilis Anniella_geronimensis
## 5 0 0
## Anniella_pulchra Barisia_imbricata Celestus_enneagrammus
## 0 5 5
## Diploglossus_bilobatus Diploglossus_pleei Elgaria_coerulea
## 5 5 5
## Elgaria_kingii Elgaria_multicarinata Elgaria_panamintina
## 5 5 5
## Elgaria_paucicarinata Gerrhonotus_liocephalus Mesaspis_moreleti
## 5 5 5
## Ophiodes_striatus Ophisaurus_apodus Ophisaurus_attenuatus
## 1 1 0
## Levels: 0 1 2 3 4 5
I realize that this isn't the best way to visualize our data - but nevertheless:
plotTree.barplot(sqTree,
setNames(as.numeric(toes)-1,names(toes)),
args.plotTree=list(fsize=0.3),args.barplot=list(space=0,
xlab="number of digits in the hindlimb",border="grey"))
Now we can set up & fit our model:
directional<-matrix(c(0,0,0,0,0,0,
1,0,0,0,0,0,
0,2,0,0,0,0,
0,0,3,0,0,0,
0,0,0,4,0,0,
0,0,0,0,5,0),6,6,byrow=TRUE,
dimnames=list(levels(toes),levels(toes)))
directional
## 0 1 2 3 4 5
## 0 0 0 0 0 0 0
## 1 1 0 0 0 0 0
## 2 0 2 0 0 0 0
## 3 0 0 3 0 0 0
## 4 0 0 0 4 0 0
## 5 0 0 0 0 5 0
fitDirectional<-fitDiscrete(sqTree,toes,model=directional)
## Warning in treedata(phy, dat): The following tips were not found in 'phy' and were dropped from 'data':
## Gonatodes_albogularis
## Lepidophyma_flavimaculatum
## Trachyboa_boulengeri
## Warning in fitDiscrete(sqTree, toes, model = directional): Parameter estimates appear at bounds:
## q12
## q13
## q14
## q15
## q16
## q23
## q24
## q25
## q26
## q31
## q34
## q35
## q36
## q41
## q42
## q45
## q46
## q51
## q52
## q53
## q56
## q61
## q62
## q63
## q64
fitDirectional
## GEIGER-fitted comparative model of discrete data
## fitted Q matrix:
## 0 1 2 3 4
## 0 0.00000000 0.00000000 0.0000000 0.0000000 0.000000000
## 1 0.02314697 -0.02314697 0.0000000 0.0000000 0.000000000
## 2 0.00000000 0.17375162 -0.1737516 0.0000000 0.000000000
## 3 0.00000000 0.00000000 0.2123601 -0.2123601 0.000000000
## 4 0.00000000 0.00000000 0.0000000 0.1118374 -0.111837366
## 5 0.00000000 0.00000000 0.0000000 0.0000000 0.004129334
## 5
## 0 0.000000000
## 1 0.000000000
## 2 0.000000000
## 3 0.000000000
## 4 0.000000000
## 5 -0.004129334
##
## model summary:
## log-likelihood = -211.796485
## AIC = 433.592970
## AICc = 433.831065
## free parameters = 5
##
## Convergence diagnostics:
## optimization iterations = 100
## failed iterations = 0
## frequency of best fit = 0.06
##
## object summary:
## 'lik' -- likelihood function
## 'bnd' -- bounds for likelihood search
## 'res' -- optimization iteration summary
## 'opt' -- maximum likelihood parameter estimates
We have also been able to fit this model using fitMk
, but this
appears to be broken:
fit.phytools<-fitMk(sqTree,toes,model=directional)
## Warning in log(comp[1:M + N]): NaNs produced
## Warning in log(comp[1:M + N]): NaNs produced
## Warning in log(comp[1:M + N]): NaNs produced
## Warning in log(comp[1:M + N]): NaNs produced
Let's load the fix and test it as follows:
source("https://raw.githubusercontent.com/liamrevell/phytools/master/R/fitMk.R")
fit.phytools<-fitMk(sqTree,toes,model=directional)
fit.phytools
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## 0 1 2 3 4 5
## 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
## 1 0.023147 -0.023147 0.000000 0.000000 0.000000 0.000000
## 2 0.000000 0.173751 -0.173751 0.000000 0.000000 0.000000
## 3 0.000000 0.000000 0.212357 -0.212357 0.000000 0.000000
## 4 0.000000 0.000000 0.000000 0.111838 -0.111838 0.000000
## 5 0.000000 0.000000 0.000000 0.000000 0.004129 -0.004129
##
## Fitted (or set) value of pi:
## 0 1 2 3 4 5
## 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
##
## Log-likelihood: -213.588245
##
## Optimization method used was "nlminb"
Note that though very similar, fitMk
and fitDiscrete
are
slightly different because they use different assumptions about the prior at the root
of the tree. Nonetheless, it seems that fitMk
is fixed and working again.
The bug will also affect other functions that use fitMk
internally, such
as make.simmap
, but it can be fixed by updating phytools from GitHub.
Dear Liam,
ReplyDeleteI am recieving the same error message when using fitMk to fit an ordered directional model, but only when using pi="estimated" as my root prior. Do you have any idea why this might be?
Thank you!
Claire
Hi Claire. If you are just seeing the error message "Warning in log(comp[1:M + N]): NaNs produced" but your optimization otherwise works, this is probably not a problem. If the optimization fails, that may be a different story. Can I ask why you're using pi="estimated"? This option has fitMk find the stationary distribution & uses that as the root prior; however, this is not likely to be an appropriate prior unless your model is symmetric. -- Liam
DeleteHi Liam, thank you for your reply. I was using pi="estimated" purely out of ignorance - I wasn't sure which posterior was best so was trying them all out to see how they affected my parameter estimations.😅 I appreciate the clarification!
Delete