As of last weekend, *phytools* is updated on CRAN.

Since I've had a hard time keeping up with this blog, I'm now trying to (retroactively) document some of the new features that this update contains.

One update is the object class `"Qmatrix"`

and associated S3 methods, including an `as.Qmatrix`

method.

The idea of this class & method is that different functions that fit discrete character models to data (and
thus estimate a **Q** transition matrix between states) return the rates of this estimated matrix in a whole
bunch of different ways. The `as.Qmatrix`

method (and `"Qmatrix"`

object class), allows us the to pull out
this matrix from our fitted model object in a standardized fashion.

To see how this works.

I'll use a phylogeny of lizards in the family Liolaemidae from South America, along with data for parity mode. The tree and data in this example come from Esquerré et al. (2018).

```
library(geiger)
library(phytools)
packageVersion("phytools")
```

```
## [1] '0.7.70'
```

```
Liol.tree<-read.nexus("Liolaemidae.MCC.nex")
Liol.tree
```

```
##
## Phylogenetic tree with 258 tips and 257 internal nodes.
##
## Tip labels:
## Ctenoblepharys_adspersa, Liolaemus_abaucan, Liolaemus_koslowskyi, Liolaemus_albiceps, Liolaemus_irregularis, Liolaemus_ornatus, ...
##
## Rooted; includes branch lengths.
```

```
Liol.data<-read.csv("Liolaemidae.data.csv",row.names=1,
stringsAsFactors=TRUE)
head(Liol.data)
```

```
## parity_mode max_altitude temperature
## Ctenoblepharys_adspersa O 750 23.05
## Liolaemus_abaucan O 2600 20.20
## Liolaemus_albiceps V 4020 12.38
## Liolaemus_andinus V 4900 11.40
## Liolaemus_annectens V 4688 5.10
## Liolaemus_anomalus O 1400 23.78
```

```
name.check(Liol.tree,Liol.data)
```

```
## [1] "OK"
```

Let's plot our tree and data. Here, instead of using a 'canned' plotting routine, I'm going to just plot my
tree as normal, and then I'll add different colored horizontal bars with the base R function `polygon`

at the
vertical position of each tip. This will help us to see the distribution of our trait across the tips of the
tree more easily.

```
h<-max(nodeHeights(Liol.tree))
plotTree(Liol.tree,ftype="off",lwd=1,
xlim=c(0,1.05*h))
parity_mode<-setNames(Liol.data$parity_mode,
rownames(Liol.data))
levels(parity_mode)<-c("oviparous","viviparous")
pp<-get("last_plot.phylo",envir=.PlotPhyloEnv)
cols<-setNames(c("#f0ead6",palette()[2]),
levels(parity_mode))[parity_mode[Liol.tree$tip.label]]
for(i in 1:Ntip(Liol.tree))
polygon(c(h,1.05*h,1.05*h,h),
rep(pp$yy[i],4)+c(-0.5,-0.5,0.5,0.5),
border=FALSE,col=cols[i])
legend("topleft",levels(parity_mode),pch=15,
col=c("#f0ead6",palette()[2]),pt.cex=1.5,
cex=0.8,bty="n")
```

Neat. Now let's start by fitting a simple M*k* model to our data - first using `fitMk`

in *phytools* and
then using `fitContinuous`

in *geiger*.

```
Mk.fit<-fitMk(Liol.tree,parity_mode,model="ARD",
pi="fitzjohn")
Mk.fit
```

```
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## oviparous viviparous
## oviparous -0.033819 0.033819
## viviparous 0.000000 0.000000
##
## Fitted (or set) value of pi:
## oviparous viviparous
## 1 0
## due to treating the root prior as (a) nuisance.
##
## Log-likelihood: -65.946269
##
## Optimization method used was "nlminb"
```

```
fD.fit<-fitDiscrete(Liol.tree,parity_mode,model="ARD")
fD.fit
```

```
## GEIGER-fitted comparative model of discrete data
## fitted Q matrix:
## oviparous viviparous
## oviparous -3.381880e-02 3.381880e-02
## viviparous 3.023729e-17 -3.023729e-17
##
## model summary:
## log-likelihood = -65.946271
## AIC = 135.892541
## AICc = 135.939600
## free parameters = 2
##
## Convergence diagnostics:
## optimization iterations = 100
## failed iterations = 0
## number of iterations with same best fit = 21
## frequency of best fit = 0.21
##
## object summary:
## 'lik' -- likelihood function
## 'bnd' -- bounds for likelihood search
## 'res' -- optimization iteration summary
## 'opt' -- maximum likelihood parameter estimates
```

Next, let's pull our **Q** matrices from each fitted model object to compare them:

```
as.Qmatrix(Mk.fit)
```

```
## Estimated Q matrix:
## oviparous viviparous
## oviparous -0.0338188 0.0338188
## viviparous 0.0000000 0.0000000
```

```
as.Qmatrix(fD.fit)
```

```
## Estimated Q matrix:
## oviparous viviparous
## oviparous -3.381880e-02 3.381880e-02
## viviparous 3.023729e-17 -3.023729e-17
```

These are basically the same. (`fitDiscrete`

uses a bounded optimization that is effectively bounded above
zero. That's why it ends up with paramter estimates that are very slightly above zero.) Neat.

A useful function of this is that we can also then easily pass the fitted value of **Q** from one method
to the likelihood function of the other.

Let's see.

```
Qhat<-as.Qmatrix(fD.fit)
Mk.fit$lik ## this is our likelihood function
```

```
## function (q)
## -lik(q, output.liks = FALSE, pi = if (root.prior == "nuisance") "fitzjohn" else pi)
## <bytecode: 0x000000001b7da088>
## <environment: 0x0000000018945390>
```

```
Mk.fit$lik(Qhat)
```

```
## [1] -65.94627
```

Now let's try our hidden-rates model.

Remember, this is a model in which some or all of our observed trait character states have one or more
unobserved states. Rates of transition between these different uobserved states are allowed to assume different
values. As such, this is a model that permits *heterogeneity* in the evolution rate for a discrete character
in different parts of the tree. This makes sense for a character like parity mode - so why don't we try to
fit it to our data.

```
hrm.fit<-fitHRM(Liol.tree,parity_mode,pi="fitzjohn",
niter=20)
```

```
##
## This is the design matrix of the fitted model.
## Does it make sense?
##
## oviparous oviparous* viviparous viviparous*
## oviparous 0 1 2 0
## oviparous* 3 0 0 4
## viviparous 5 0 0 6
## viviparous* 0 7 8 0
##
## log-likelihood from current iteration: -61.0197711223499
## --- Best log-likelihood so far: -61.0197711223499 ---
## log-likelihood from current iteration: -60.6924346307293
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -61.0733471806403
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -62.4486138909471
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -65.9463331744238
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -62.447949469765
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -63.9024247439437
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -60.6924346307343
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -62.4482458361831
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -65.9439241118116
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -65.9462741239522
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -85.7026994881037
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -65.946269375495
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -60.714873606756
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -61.019765219294
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -62.4479322276706
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -63.9785288298267
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -62.4479768466855
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -61.0197570736084
## --- Best log-likelihood so far: -60.6924346307293 ---
## log-likelihood from current iteration: -62.4479569470663
## --- Best log-likelihood so far: -60.6924346307293 ---
```

```
hrm.fit
```

```
## Object of class "fitHRM".
##
## Observed states: [ oviparous, viviparous ]
## Number of rate categories per state: [ 2, 2 ]
##
## Fitted (or set) value of Q:
## oviparous oviparous* viviparous viviparous*
## oviparous -1.971161 0.039879 1.931282 0.000000
## oviparous* 0.037896 -0.037896 0.000000 0.000000
## viviparous 2.697096 0.000000 -3.111551 0.414455
## viviparous* 0.000000 0.000000 0.000000 0.000000
##
## Fitted (or set) value of pi:
## oviparous oviparous* viviparous viviparous*
## 0.034061 0.939411 0.026527 0.000000
## due to treating the root prior as (a) nuisance.
##
## Log-likelihood: -60.692435
##
## Optimization method used was "nlminb"
```

```
as.Qmatrix(hrm.fit)
```

```
## Estimated Q matrix:
## oviparous oviparous* viviparous viviparous*
## oviparous -1.97116076 0.03987888 1.931282 0.0000000
## oviparous* 0.03789636 -0.03789636 0.000000 0.0000000
## viviparous 2.69709588 0.00000000 -3.111551 0.4144553
## viviparous* 0.00000000 0.00000000 0.000000 0.0000000
```

We can likewise fit the same model using `corHMM`

in the
*corHMM* package by Beaulieu et al. (2020). We'll do that as
follows:

```
library(corHMM)
corhmm.data<-data.frame(Genus_species=names(parity_mode),
parity_mode=as.numeric(parity_mode))
head(corhmm.data)
```

```
## Genus_species parity_mode
## 1 Ctenoblepharys_adspersa 1
## 2 Liolaemus_abaucan 1
## 3 Liolaemus_albiceps 2
## 4 Liolaemus_andinus 2
## 5 Liolaemus_annectens 2
## 6 Liolaemus_anomalus 1
```

```
corhmm.fit<-corHMM(Liol.tree,corhmm.data,rate.cat=2,nstarts=20,
root.p="maddfitz")
```

```
## State distribution in data:
## States: 1 2
## Counts: 79 179
## Beginning thorough optimization search -- performing 20 random restarts
## Finished. Inferring ancestral states using marginal reconstruction.
```

```
corhmm.fit
```

```
##
## Fit
## -lnL AIC AICc Rate.cat ntax
## -61.12349 134.247 134.5816 2 258
##
## Rates
## (1,R1) (2,R1) (1,R2) (2,R2)
## (1,R1) NA 0.270645361 0.032034295 NA
## (2,R1) 5.4693e+00 NA NA 0.0320343
## (1,R2) 1.0000e-09 NA NA 0.2677090
## (2,R2) NA 0.000000001 0.000000001 NA
##
## Arrived at a reliable solution
```

We get a slightly different solution; however, I also wrote an `as.Qmatrix`

for the `corhmm`

object class,
so let's use it to check & see if our fitted model gives us the same likelihood in `fitHRM`

.

We have to reorder the rows & columns of our fitted transition matrix, **Q**, because `corHMM`

and
`fitHRM`

set the order of the hidden-states slightly differently.

```
Qhat<-as.Qmatrix(corhmm.fit)
Qhat
```

```
## Estimated Q matrix:
## (1,R1) (2,R1) (1,R2) (2,R2)
## (1,R1) -0.302679656 0.270645361 0.032034295 0.000000000
## (2,R1) 5.469299602 -5.501333897 0.000000000 0.032034295
## (1,R2) 0.000000001 0.000000000 -0.267708984 0.267708983
## (2,R2) 0.000000000 0.000000001 0.000000001 -0.000000002
```

```
Qhat[]<-Qhat[c(1,3,2,4),c(1,3,2,4)]
```

Now compute the likelihood:

```
hrm.fit$lik(Qhat)
```

```
## [1] -61.12349
```

We see that this matches exactly the likelihood that was computed by `corHMM`

, strongly indicating that
both functions are fitting the same model (which is good). In this case, `fitHRM`

does a little bit better,
but if we repeated our optimization, we might find that `corHMM`

did better, or that they produced the
same result.

Finally, let's fit a different hidden-rates model, but in this case one in which transitions between the hidden states are not allowed.

In `fitHRM`

this is called the `umbral=TRUE`

model because of its similarity to the threshold model. One way
to think about the umbral model as a model in which each observed character level can be in a 'warm' or
'cold' condition. When a lineage is in the cold condition for the trait, changes are turned off completely.
The only way to change is by first transitioning back to the warm condition. This model is just a
generalization of the covarion model of Galtier
(2001), but in which each not each trait level needs to have a hidden rate category, and in which the
number of hidden rate categories can be more than two per observed state (although I've no idea if this
model is actually identifiable).

```
hrm.umbral<-fitHRM(Liol.tree,parity_mode,umbral=TRUE,
pi="fitzjohn")
```

```
##
## This is the design matrix of the fitted model.
## Does it make sense?
##
## oviparous oviparous* viviparous viviparous*
## oviparous 0 1 2 0
## oviparous* 3 0 0 0
## viviparous 4 0 0 5
## viviparous* 0 0 6 0
##
## log-likelihood from current iteration: -65.9469992403555
## --- Best log-likelihood so far: -65.9469992403555 ---
## log-likelihood from current iteration: -65.9470202798638
## --- Best log-likelihood so far: -65.9469992403555 ---
## log-likelihood from current iteration: -60.6924346307326
## --- Best log-likelihood so far: -60.6924346307326 ---
## log-likelihood from current iteration: -63.9197013881581
## --- Best log-likelihood so far: -60.6924346307326 ---
## log-likelihood from current iteration: -65.9470744150344
## --- Best log-likelihood so far: -60.6924346307326 ---
## log-likelihood from current iteration: -65.94289825734
## --- Best log-likelihood so far: -60.6924346307326 ---
## log-likelihood from current iteration: -60.6924346307313
## --- Best log-likelihood so far: -60.6924346307313 ---
## log-likelihood from current iteration: -65.9462856362377
## --- Best log-likelihood so far: -60.6924346307313 ---
## log-likelihood from current iteration: -63.9121710758381
## --- Best log-likelihood so far: -60.6924346307313 ---
## log-likelihood from current iteration: -66.1082211100812
## --- Best log-likelihood so far: -60.6924346307313 ---
```

```
hrm.umbral
```

```
## Object of class "fitHRM".
##
## Observed states: [ oviparous, viviparous ]
## Number of rate categories per state: [ 2, 2 ]
##
## Fitted (or set) value of Q:
## oviparous oviparous* viviparous viviparous*
## oviparous -1.971170 0.039879 1.931291 0.000000
## oviparous* 0.037896 -0.037896 0.000000 0.000000
## viviparous 2.697106 0.000000 -3.111561 0.414455
## viviparous* 0.000000 0.000000 0.000000 0.000000
##
## Fitted (or set) value of pi:
## oviparous oviparous* viviparous viviparous*
## 0.034062 0.939411 0.026528 0.000000
## due to treating the root prior as (a) nuisance.
##
## Log-likelihood: -60.692435
##
## Optimization method used was "nlminb"
```

We can graph each of these three models as follows.

```
layout(matrix(c(1,1,2,2,4,3,3,4),2,4,byrow=TRUE))
plot(Mk.fit)
mtext("a) ARD model",adj=0.1,line=0,cex=0.8)
plot(hrm.umbral,spacer=0.3)
mtext("b) \"umbral\" hidden-rates model",adj=0.1,
line=0,cex=0.8)
plot(hrm.fit,spacer=0.15)
mtext("c) hidden-rates model",adj=0.1,
line=0,cex=0.8)
```

Finally, let's *compare* our three models using AIC.

```
AIC(Mk.fit,hrm.umbral,hrm.fit)
```

```
## df AIC
## Mk.fit 2 135.8925
## hrm.umbral 6 133.3849
## hrm.fit 8 137.3849
```

This tells us that our “umbral” HRM is the best fit to our data - although not by much.

Hi there - I was wondering if you could share your version of 'Liolaemidae.data.csv'- the version from the paper is in word format and converting it to reproduce your work here has proved awkward

ReplyDeleteHi Unknown. I had the same problem. I believe I was able to convert the PDF to a Word document, and then I copied & pasted the table into Excel before saving as CSV. If you email me I will send you the data file. - Liam

Deletebrill will do!

Delete