Monday, October 30, 2023

Computing an HPD interval on the evolutionary correlation from threshBayes in phytools

Today a phytools user & R-sig-phylo list participant asked:

“I am performing analyses using threshBayes from the phytools package. I was wondering if someone has ever extracted the highest posterior density intervals (HPD) values using this function. How could I make this?”

As a quick reminder, the phytools function threshBayes implements a Bayesian MCMC version of Felsenstein’s (2005; 2012) threshold model for estimating the evolutionary correlation between binary character (or between a discrete & binary trait) under a phylogenetic comparative interpretation of Wright’s threshold model from quantitative genetics.

This question is a bit of a softball, since we provided an exact workflow to do precisely this in Chapter 7 of our recent Princeton University Press book; however, not everyone has the book (hard to believe, I know), so I thought it was worth reiterating here!

To illustrate this workflow I’m going to use an excellent dataset of parity mode (i.e., oviparity vs. viviparity) & environmental temperature in South American liolaemid lizards from Esquerré et al. (2019).

We can start by loading phytools.

library(phytools)
packageVersion("phytools")
## [1] '1.9.25'

Now let’s add our data & tree as follows.

data("liolaemid.data")
head(liolaemid.data)
##                         parity_mode max_altitude temperature
## Ctenoblepharys_adspersa           O          750       23.05
## Liolaemus_abaucan                 O         2600       20.20
## Liolaemus_albiceps                V         4020       12.38
## Liolaemus_andinus                 V         4900       11.40
## Liolaemus_annectens               V         4688        5.10
## Liolaemus_anomalus                O         1400       23.78
data("liolaemid.tree")
liolaemid.tree
## 
## Phylogenetic tree with 257 tips and 256 internal nodes.
## 
## Tip labels:
##   Liolaemus_abaucan, Liolaemus_koslowskyi, Liolaemus_albiceps, Liolaemus_irregularis, Liolaemus_ornatus, Liolaemus_calchaqui, ...
## 
## Rooted; includes branch lengths.
plotTree(liolaemid.tree,ftype="i",fsize=0.4,type="arc",
  lwd=1,arc_height=0.5)

plot of chunk unnamed-chunk-27

There is one taxon in the dataset that’s missing from our tree.

chk<-geiger::name.check(liolaemid.tree,liolaemid.data)
chk
## $tree_not_data
## character(0)
## 
## $data_not_tree
## [1] "Ctenoblepharys_adspersa"
liolaemid.data<-liolaemid.data[liolaemid.tree$tip.label,]

We’re already ready to run our MCMC!

mcmc_liol<-threshBayes(liolaemid.tree,liolaemid.data[,c(1,3)],
  type=c("disc","cont"),ngen=500000,plot=FALSE,
  control=list(print.interval=50000))
## Starting MCMC....
## generation: 50000; mean acceptance rate: 0.19
## generation: 100000; mean acceptance rate: 0.21
## generation: 150000; mean acceptance rate: 0.2
## generation: 200000; mean acceptance rate: 0.19
## generation: 250000; mean acceptance rate: 0.32
## generation: 300000; mean acceptance rate: 0.41
## generation: 350000; mean acceptance rate: 0.25
## generation: 400000; mean acceptance rate: 0.3
## generation: 450000; mean acceptance rate: 0.23
## generation: 500000; mean acceptance rate: 0.28
## Done MCMC.

Here we’ve run our MCMC for 500,000 generations. In practice, I’d recommend 1 million or more.

plot(mcmc_liol)

plot of chunk unnamed-chunk-31

To be cautious, let’s pull out the just the latter half of this MCMC.

r.mcmc<-tail(mcmc_liol$par$r,0.5*nrow(mcmc_liol$par))

We’ll assign this vector the class attribute "mcmc" and then is we use coda::HPDinterval we should obtain a 95% high probability density interval on the correlation coefficient, our parameter of interest here.

class(r.mcmc)<-"mcmc"
hpd.liol_r<-coda::HPDinterval(r.mcmc)
hpd.liol_r
##           lower      upper
## var1 -0.7523365 -0.4540421
## attr(,"Probability")
## [1] 0.95002

Due to the ordination of our character 0 <-> V, a negative evolutionary correlation implies that species with cooler environmental temperatures should be more likely to evolve towards viviparity (and the converse).

Lastly, let’s graph our posterior density & HPD interval for r.

plot(density(mcmc_liol,burnin=250000),bty="n",cex.lab=0.9,cex.axis=0.7)
h<-0-par()$usr[3]
lines(x=hpd.liol_r,y=rep(-h/2,2))
lines(x=rep(hpd.liol_r[1],2),y=c(-0.3,-0.7)*h)
lines(x=rep(hpd.liol_r[2],2),y=c(-0.3,-0.7)*h)

plot of chunk unnamed-chunk-34

That pretty much comports with what we showed in our book!