As of last weekend, phytools is updated on CRAN.
Since I've had a hard time keeping up with this blog, I'm now trying to (retroactively) document some of the new features that this update contains.
One update is the object class "Qmatrix"
and associated S3 methods, including an as.Qmatrix
method.
The idea of this class & method is that different functions that fit discrete character models to data (and
thus estimate a Q transition matrix between states) return the rates of this estimated matrix in a whole
bunch of different ways. The as.Qmatrix
method (and "Qmatrix"
object class), allows us the to pull out
this matrix from our fitted model object in a standardized fashion.
To see how this works.
I'll use a phylogeny of lizards in the family Liolaemidae from South America, along with data for parity mode. The tree and data in this example come from Esquerré et al. (2018).
library(geiger)
library(phytools)
packageVersion("phytools")
## [1] '0.7.70'
Liol.tree<-read.nexus("Liolaemidae.MCC.nex")
Liol.tree
##
## Phylogenetic tree with 258 tips and 257 internal nodes.
##
## Tip labels:
## Ctenoblepharys_adspersa, Liolaemus_abaucan, Liolaemus_koslowskyi, Liolaemus_albiceps, Liolaemus_irregularis, Liolaemus_ornatus, ...
##
## Rooted; includes branch lengths.
Liol.data<-read.csv("Liolaemidae.data.csv",row.names=1,
stringsAsFactors=TRUE)
head(Liol.data)
## parity_mode max_altitude temperature
## Ctenoblepharys_adspersa O 750 23.05
## Liolaemus_abaucan O 2600 20.20
## Liolaemus_albiceps V 4020 12.38
## Liolaemus_andinus V 4900 11.40
## Liolaemus_annectens V 4688 5.10
## Liolaemus_anomalus O 1400 23.78
name.check(Liol.tree,Liol.data)
## [1] "OK"
Let's plot our tree and data. Here, instead of using a 'canned' plotting routine, I'm going to just plot my
tree as normal, and then I'll add different colored horizontal bars with the base R function polygon
at the
vertical position of each tip. This will help us to see the distribution of our trait across the tips of the
tree more easily.
h<-max(nodeHeights(Liol.tree))
plotTree(Liol.tree,ftype="off",lwd=1,
xlim=c(0,1.05*h))
parity_mode<-setNames(Liol.data$parity_mode,
rownames(Liol.data))
levels(parity_mode)<-c("oviparous","viviparous")
pp<-get("last_plot.phylo",envir=.PlotPhyloEnv)
cols<-setNames(c("#f0ead6",palette()[2]),
levels(parity_mode))[parity_mode[Liol.tree$tip.label]]
for(i in 1:Ntip(Liol.tree))
polygon(c(h,1.05*h,1.05*h,h),
rep(pp$yy[i],4)+c(-0.5,-0.5,0.5,0.5),
border=FALSE,col=cols[i])
legend("topleft",levels(parity_mode),pch=15,
col=c("#f0ead6",palette()[2]),pt.cex=1.5,
cex=0.8,bty="n")
Neat. Now let's start by fitting a simple Mk model to our data - first using fitMk
in phytools and
then using fitContinuous
in geiger.
Mk.fit<-fitMk(Liol.tree,parity_mode,model="ARD",
pi="fitzjohn")
Mk.fit
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## oviparous viviparous
## oviparous -0.033819 0.033819
## viviparous 0.000000 0.000000
##
## Fitted (or set) value of pi:
## oviparous viviparous
## 1 0
## due to treating the root prior as (a) nuisance.
##
## Log-likelihood: -65.946269
##
## Optimization method used was "nlminb"
fD.fit<-fitDiscrete(Liol.tree,parity_mode,model="ARD")
fD.fit
## GEIGER-fitted comparative model of discrete data
## fitted Q matrix:
## oviparous viviparous
## oviparous -3.381880e-02 3.381880e-02
## viviparous 3.023729e-17 -3.023729e-17
##
## model summary:
## log-likelihood = -65.946271
## AIC = 135.892541
## AICc = 135.939600
## free parameters = 2
##
## Convergence diagnostics:
## optimization iterations = 100
## failed iterations = 0
## number of iterations with same best fit = 21
## frequency of best fit = 0.21
##
## object summary:
## 'lik' -- likelihood function
## 'bnd' -- bounds for likelihood search
## 'res' -- optimization iteration summary
## 'opt' -- maximum likelihood parameter estimates
Next, let's pull our Q matrices from each fitted model object to compare them:
as.Qmatrix(Mk.fit)
## Estimated Q matrix:
## oviparous viviparous
## oviparous -0.0338188 0.0338188
## viviparous 0.0000000 0.0000000
as.Qmatrix(fD.fit)
## Estimated Q matrix:
## oviparous viviparous
## oviparous -3.381880e-02 3.381880e-02
## viviparous 3.023729e-17 -3.023729e-17
These are basically the same. (fitDiscrete
uses a bounded optimization that is effectively bounded above
zero. That's why it ends up with paramter estimates that are very slightly above zero.) Neat.
A useful function of this is that we can also then easily pass the fitted value of Q from one method to the likelihood function of the other.
Let's see.
Qhat<-as.Qmatrix(fD.fit)
Mk.fit$lik ## this is our likelihood function
## function (q)
## -lik(q, output.liks = FALSE, pi = if (root.prior == "nuisance") "fitzjohn" else pi)
## <bytecode: 0x000000001b7da088>
## <environment: 0x0000000018945390>
Mk.fit$lik(Qhat)
## [1] -65.94627
Now let's try our hidden-rates model.
Remember, this is a model in which some or all of our observed trait character states have one or more unobserved states. Rates of transition between these different uobserved states are allowed to assume different values. As such, this is a model that permits heterogeneity in the evolution rate for a discrete character in different parts of the tree. This makes sense for a character like parity mode - so why don't we try to fit it to our data.
hrm.fit<-fitHRM(Liol.tree,parity_mode,pi="fitzjohn",
niter=20)
##
## This is the design matrix of the fitted model.
## Does it make sense?
##
## oviparous oviparous* viviparous viviparous*
## oviparous 0 1 2 0
## oviparous* 3 0 0 4
## viviparous 5 0 0 6
## viviparous* 0 7 8 0
##
## log-likelihood from current iteration: -61.0197711223499
## --- Best log-likelihood so far: -61.0197711223499 ---
## log-likelihood from current iteration: -60.6924346307293
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -61.0733471806403
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -62.4486138909471
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -65.9463331744238
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -62.447949469765
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -63.9024247439437
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -60.6924346307343
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -62.4482458361831
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -65.9439241118116
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -65.9462741239522
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -85.7026994881037
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -65.946269375495
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -60.714873606756
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -61.019765219294
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -62.4479322276706
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -63.9785288298267
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -62.4479768466855
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -61.0197570736084
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -62.4479569470663
## --- Best log-likelihood so far: -60.6924346307293 ---
hrm.fit
## Object of class "fitHRM".
##
## Observed states: [ oviparous, viviparous ]
## Number of rate categories per state: [ 2, 2 ]
##
## Fitted (or set) value of Q:
## oviparous oviparous* viviparous viviparous*
## oviparous -1.971161 0.039879 1.931282 0.000000
## oviparous* 0.037896 -0.037896 0.000000 0.000000
## viviparous 2.697096 0.000000 -3.111551 0.414455
## viviparous* 0.000000 0.000000 0.000000 0.000000
##
## Fitted (or set) value of pi:
## oviparous oviparous* viviparous viviparous*
## 0.034061 0.939411 0.026527 0.000000
## due to treating the root prior as (a) nuisance.
##
## Log-likelihood: -60.692435
##
## Optimization method used was "nlminb"
as.Qmatrix(hrm.fit)
## Estimated Q matrix:
## oviparous oviparous* viviparous viviparous*
## oviparous -1.97116076 0.03987888 1.931282 0.0000000
## oviparous* 0.03789636 -0.03789636 0.000000 0.0000000
## viviparous 2.69709588 0.00000000 -3.111551 0.4144553
## viviparous* 0.00000000 0.00000000 0.000000 0.0000000
We can likewise fit the same model using corHMM
in the
corHMM package by Beaulieu et al. (2020). We'll do that as
follows:
library(corHMM)
corhmm.data<-data.frame(Genus_species=names(parity_mode),
parity_mode=as.numeric(parity_mode))
head(corhmm.data)
## Genus_species parity_mode
## 1 Ctenoblepharys_adspersa 1
## 2 Liolaemus_abaucan 1
## 3 Liolaemus_albiceps 2
## 4 Liolaemus_andinus 2
## 5 Liolaemus_annectens 2
## 6 Liolaemus_anomalus 1
corhmm.fit<-corHMM(Liol.tree,corhmm.data,rate.cat=2,nstarts=20,
root.p="maddfitz")
## State distribution in data:
## States: 1 2
## Counts: 79 179
## Beginning thorough optimization search -- performing 20 random restarts
## Finished. Inferring ancestral states using marginal reconstruction.
corhmm.fit
##
## Fit
## -lnL AIC AICc Rate.cat ntax
## -61.12349 134.247 134.5816 2 258
##
## Rates
## (1,R1) (2,R1) (1,R2) (2,R2)
## (1,R1) NA 0.270645361 0.032034295 NA
## (2,R1) 5.4693e+00 NA NA 0.0320343
## (1,R2) 1.0000e-09 NA NA 0.2677090
## (2,R2) NA 0.000000001 0.000000001 NA
##
## Arrived at a reliable solution
We get a slightly different solution; however, I also wrote an as.Qmatrix
for the corhmm
object class,
so let's use it to check & see if our fitted model gives us the same likelihood in fitHRM
.
We have to reorder the rows & columns of our fitted transition matrix, Q, because corHMM
and
fitHRM
set the order of the hidden-states slightly differently.
Qhat<-as.Qmatrix(corhmm.fit)
Qhat
## Estimated Q matrix:
## (1,R1) (2,R1) (1,R2) (2,R2)
## (1,R1) -0.302679656 0.270645361 0.032034295 0.000000000
## (2,R1) 5.469299602 -5.501333897 0.000000000 0.032034295
## (1,R2) 0.000000001 0.000000000 -0.267708984 0.267708983
## (2,R2) 0.000000000 0.000000001 0.000000001 -0.000000002
Qhat[]<-Qhat[c(1,3,2,4),c(1,3,2,4)]
Now compute the likelihood:
hrm.fit$lik(Qhat)
## [1] -61.12349
We see that this matches exactly the likelihood that was computed by corHMM
, strongly indicating that
both functions are fitting the same model (which is good). In this case, fitHRM
does a little bit better,
but if we repeated our optimization, we might find that corHMM
did better, or that they produced the
same result.
Finally, let's fit a different hidden-rates model, but in this case one in which transitions between the hidden states are not allowed.
In fitHRM
this is called the umbral=TRUE
model because of its similarity to the threshold model. One way
to think about the umbral model as a model in which each observed character level can be in a 'warm' or
'cold' condition. When a lineage is in the cold condition for the trait, changes are turned off completely.
The only way to change is by first transitioning back to the warm condition. This model is just a
generalization of the covarion model of Galtier
(2001), but in which each not each trait level needs to have a hidden rate category, and in which the
number of hidden rate categories can be more than two per observed state (although I've no idea if this
model is actually identifiable).
hrm.umbral<-fitHRM(Liol.tree,parity_mode,umbral=TRUE,
pi="fitzjohn")
##
## This is the design matrix of the fitted model.
## Does it make sense?
##
## oviparous oviparous* viviparous viviparous*
## oviparous 0 1 2 0
## oviparous* 3 0 0 0
## viviparous 4 0 0 5
## viviparous* 0 0 6 0
##
## log-likelihood from current iteration: -65.9469992403555
## --- Best log-likelihood so far: -65.9469992403555 ---
## log-likelihood from current iteration: -65.9470202798638
## --- Best log-likelihood so far: -65.9469992403555 ---
## log-likelihood from current iteration: -60.6924346307326
## --- Best log-likelihood so far: -60.6924346307326 ---
## log-likelihood from current iteration: -63.9197013881581
## --- Best log-likelihood so far: -60.6924346307326 ---
## log-likelihood from current iteration: -65.9470744150344
## --- Best log-likelihood so far: -60.6924346307326 ---
## log-likelihood from current iteration: -65.94289825734
## --- Best log-likelihood so far: -60.6924346307326 ---
## log-likelihood from current iteration: -60.6924346307313
## --- Best log-likelihood so far: -60.6924346307313 ---
## log-likelihood from current iteration: -65.9462856362377
## --- Best log-likelihood so far: -60.6924346307313 ---
## log-likelihood from current iteration: -63.9121710758381
## --- Best log-likelihood so far: -60.6924346307313 ---
## log-likelihood from current iteration: -66.1082211100812
## --- Best log-likelihood so far: -60.6924346307313 ---
hrm.umbral
## Object of class "fitHRM".
##
## Observed states: [ oviparous, viviparous ]
## Number of rate categories per state: [ 2, 2 ]
##
## Fitted (or set) value of Q:
## oviparous oviparous* viviparous viviparous*
## oviparous -1.971170 0.039879 1.931291 0.000000
## oviparous* 0.037896 -0.037896 0.000000 0.000000
## viviparous 2.697106 0.000000 -3.111561 0.414455
## viviparous* 0.000000 0.000000 0.000000 0.000000
##
## Fitted (or set) value of pi:
## oviparous oviparous* viviparous viviparous*
## 0.034062 0.939411 0.026528 0.000000
## due to treating the root prior as (a) nuisance.
##
## Log-likelihood: -60.692435
##
## Optimization method used was "nlminb"
We can graph each of these three models as follows.
layout(matrix(c(1,1,2,2,4,3,3,4),2,4,byrow=TRUE))
plot(Mk.fit)
mtext("a) ARD model",adj=0.1,line=0,cex=0.8)
plot(hrm.umbral,spacer=0.3)
mtext("b) \"umbral\" hidden-rates model",adj=0.1,
line=0,cex=0.8)
plot(hrm.fit,spacer=0.15)
mtext("c) hidden-rates model",adj=0.1,
line=0,cex=0.8)
Finally, let's compare our three models using AIC.
AIC(Mk.fit,hrm.umbral,hrm.fit)
## df AIC
## Mk.fit 2 135.8925
## hrm.umbral 6 133.3849
## hrm.fit 8 137.3849
This tells us that our “umbral” HRM is the best fit to our data - although not by much.
Hi there - I was wondering if you could share your version of 'Liolaemidae.data.csv'- the version from the paper is in word format and converting it to reproduce your work here has proved awkward
ReplyDeleteHi Unknown. I had the same problem. I believe I was able to convert the PDF to a Word document, and then I copied & pasted the table into Excel before saving as CSV. If you email me I will send you the data file. - Liam
Deletebrill will do!
Delete