As of just last week there is a new version of *phytools* (2.3-0) on CRAN with lots of new additions, features, & functions. Please check it out!

Dirk Eddelbuettel also provides a compact summary of the *phytools* updates since the last CRAN version (*phytools* 2.1-1) on his CRANberries aggregator (also screen-shotted below).

As I’ve been writing about recently (1, 2, 3, 4, 5), one of the new functions in *phytools* 2.3-0 is `fitmultiBM`

which fits a discrete character state dependent multi-rate Brownian motion model using the discrete approximation of Boucher & Démery (2016). (Actually, to follow along in this post, I’d recommend updating *phytools* from GitHub as I’ve *already* posted a few small improvements to `fitmultiBM`

there.)

What I’d like to do today is examine parameter estimation under the model – and perhaps compare it to parameter estimation from existing workflows to model multi-rate discrete character dependent continuous trait evolution.

Let’s begin by loading *phytools*.

```
library(phytools)
packageVersion("phytools")
```

```
## [1] '2.3.1'
```

Now I’m going to undertake a *very limited* simulation study here – mostly because computation is still quite slow for this function.

First, I’ll simulate forty 200-taxon pure-birth trees using `pbtree`

with a total tree length of 1.0.

Next, I’ll generate a discrete character history on each of them using `sim.history`

. I’ll simulate equal back-and-forth rates between two character levels (`"a"`

and `"b"`

) with a transition rate, \(q\), drawn from a uniform distribution on \([1,10]\). Note that these are quite high transition rates for a tree of length 1.0. This was done on purpose, for reasons I’ll discuss below.

Finally, I’ll simulate a continuous trait evolving with a Brownian rate that depends on the discrete character: with \(\sigma_{a}^2\) drawn randomly from a uniform distribution on \([0.5,2]\) and \(\sigma_{b}^2\) drawn randomly on \([5,20]\).

```
## set simulation conditions
nsim<-40
N<-200
Q<-matrix(c(-1,1,1,-1),2,2,
dimnames=list(c("a","b"),c("a","b")))
q<-runif(n=nsim,1,10)
sig2<-replicate(nsim,
setNames(c(runif(n=1,0.5,2),runif(n=1,5,20)),
c("a","b")),simplify=FALSE)
## simulate trees and data
trees<-pbtree(n=N,nsim=nsim,scale=1)
tt<-mapply(sim.history,tree=trees,
Q=lapply(q,function(q,Q) q*Q,Q=Q),
MoreArgs=list(message=FALSE),SIMPLIFY=FALSE)
y<-lapply(tt,getStates,type="tips")
x<-mapply(sim.rates,tree=tt,sig2=sig2,
SIMPLIFY=FALSE)
```

We can now fit our models.

In the code below, I’m using `phytools::fitmultiBM`

to fit our discrete character state dependent quantitative trait evolution model. I set `parallel=TRUE`

so that internally-used functions parallelize matrix exponentiation, which helps alot with the computation. If you re-run my code, you’ll find that it’s printing out the log-likelihood from each analysis. I’m also using `try`

to make sure that one optimization failure doesn’t kill the whole study!

I’m pretty confident I got convergence to the ML solution for each dataset; however, in practical empirical cases I’d recommend running multiple optimization iterations & would be happy to illustrate that in another post.

```
## create list to populate
fits_mbm<-list()
## iterate over simulated data
for(i in 1:nsim){
cat(paste("dataset #",i,": \n",sep=""))
## repeat if non-convergent or fails
fits_mbm[[i]]<-list(opt_results=list(convergence=1))
while(fits_mbm[[i]]$opt_results$convergence!=0){
class(fits_mbm[[i]])<-"try-error"
while(inherits(fits_mbm[[i]],"try-error")){
fits_mbm[[i]]<-try(fitmultiBM(tree=trees[[i]],
x=x[[i]],y=y[[i]],parallel=TRUE))
}
cat(paste("\tlog(L) = ",
paste(round(logLik(fits_mbm[[i]]),3),
collapse=" "),"\n",sep=""))
}
}
```

OK, so let’s take a look at our results.

Firstly, in a prior post on this blog I *predicted* that when the rate of our continuous trait did indeed vary as a function of our discrete character, then fitting a model of their joint evolution should result in more accurate estimation of the transition rates between character levels for our discrete trait. This is because the continuous character (in that case) should contain information about the evolutionary process for our discrete trait. By taking this information into account when we jointly estimate \(q\), our estimate should become more accurate!

Let’s see if that’s true by graphing our estimates of \(q\) from `fitmultiBM`

as well as values obtained conventionally using `phytools::fitMk`

.

```
par(mfrow=c(1,2),mar=c(5.1,5.1,2.1,2.1))
est_q<-sapply(fits_mbm,function(x) x$rates)
plot(q,est_q,log="xy",xlim=c(1,20),ylim=c(1,20),
xlab="true value of q",
ylab="fitmultiBM estimate of q",pch=21,bg="grey",
cex=1.5,las=1,cex.axis=0.8)
abline(a=0,b=1,lty="dotted")
mk_q<-mapply(function(t,x) fitMk(t,x)$rates,
t=trees,x=y)
plot(q,mk_q,log="xy",xlim=c(1,20),ylim=c(1,20),
xlab="true value of q",
ylab="fitMk estimate of q",pch=21,bg="grey",
cex=1.5,las=1,cex.axis=0.8)
abline(a=0,b=1,lty="dotted")
```

These two panels look quite similar, though I see slightly more scatter in the right compared to the left panel. Let’s see if I’m correct.

```
## estimates from fitmultiBM
cor(q,est_q)
```

```
## [1] 0.8032182
```

```
## estimates from fitMk
cor(q,mk_q)
```

```
## [1] 0.7641394
```

Indeed, this shows that we (slightly) more accurately estimated the transition rate between discrete character levels when we jointly estimated it with our multi-rate Brownian evolution model. This needs more investigation, but is pretty convincing at first glance.

Next, moving to the more important question, let’s see how we did in estimating our \(\sigma^2\) values themselves.

Here, I’ll just plot the estimates of \(\sigma_a^2\) and \(\sigma_b^2\) in separate panels.

```
par(mfrow=c(1,2),mar=c(5.1,5.1,2.1,2.1))
sig2_est<-t(sapply(fits_mbm,function(x) x$sigsq))
sig2_gen<-t(sapply(sig2,function(x) x))
plot(sig2_gen[,1],sig2_est[,1],log="xy",
xlim=range(c(sig2_gen[,1],sig2_est[,1])),
ylim=range(c(sig2_gen[,1],sig2_est[,1])),
xlab=expression(paste("true value of ",
sigma[a]^2)),
ylab=expression(paste("fitmultiBM estimate of ",
sigma[a]^2)),pch=21,bg="grey",cex=1.5,las=1,
cex.axis=0.8)
abline(a=0,b=1,lty="dotted")
plot(sig2_gen[,2],sig2_est[,2],log="xy",
xlim=range(c(sig2_gen[,2],sig2_est[,2])),
ylim=range(c(sig2_gen[,2],sig2_est[,2])),
xlab=expression(paste("true value of ",
sigma[b]^2)),
ylab=expression(paste("fitmultiBM estimate of ",
sigma[b]^2)),pch=21,bg="grey",cex=1.5,las=1,
cex.axis=0.8)
abline(a=0,b=1,lty="dotted")
```

This is hard to assess, but it would seem *relatively* unbiased. We would need more than 40 simulations to check for sure!

What’s going to be much more interesting, however, is to *compare* this result to a *standard workflow* for analyzing multi-rate Brownian evolution in a continuous character that we hypothesize varies as a function of a continuous trait.

In a recent post I characterized this analytical workflow as follows: *“The standard way to investigate this type of hypothesis in phylogenetic comparative biology has been to first map the discrete character history on the tree (using, for example, stochastic character mapping), and then fit a regime-dependent model following O’Meara et al. (2006).”* We would then average across our set of maps to obtain parameter estimates of \(\sigma_a^2\) and \(\sigma_b^2\).

I described how this can result in a biased *underestimate* of the difference in rate between regimes in a 2013 paper published in *Systematic Biology*.

In the code chunk below, I apply this workflow using `fitMk`

, `simmap`

, and `brownie.lite`

in *phytools*. (For anyone following along, you might considering parallelizing your `brownie.lite`

optimizations using the *foreach* package. I wish I had!)

```
mk_fits<-mapply(fitMk,tree=trees,x=y,SIMPLIFY=FALSE)
smaps<-lapply(mk_fits,simmap)
bm_sig2<-matrix(NA,nrow=nsim,ncol=2,
dimnames=list(1:nsim,c("a","b")))
for(i in 1:nsim){
ss_fits<-lapply(smaps[[i]],brownie.lite,x=x[[i]])
bm_sig2[i,]<-rowMeans(sapply(ss_fits,
function(x) x$sig2.multiple[c("a","b")]))
}
```

Finally, let’s graph our results.

This time, and since we already noted (above) that \(\sigma_a^2\) and \(\sigma_b^2\) were estimated more or less unbiasedly by `fitmultiBM`

, why don’t we focus on the *ratio* of and \(\sigma_b^2/\sigma_a^2\)? Indeed, it’s the *difference* in rate between discrete character levels that Revell (2013) tells us should be underestimated in our standard `fitMk`

\(\rightarrow\) `simmap`

\(\rightarrow\) `brownie.lite`

workflow.

```
par(mfrow=c(1,2),mar=c(5.1,5.1,3.1,2.1))
est_sigsq<-t(sapply(fits_mbm,function(x) x$sigsq))
plot(sig2_gen[,2]/sig2_gen[,1],
est_sigsq[,2]/est_sigsq[,1],
log="xy",xlim=c(1,30),ylim=c(1,30),
xlab=expression(paste("generating ",
sigma[b]^2/sigma[a]^2)),
ylab=expression(paste("fitmultiBM estimated ",
sigma[b]^2/sigma[a]^2)),pch=21,bg="grey",
cex=1.5,las=1,cex.axis=0.8)
abline(a=0,b=1,lty="dotted")
mtext("a) fitmultiBM analysis",line=1,adj=-0.2)
plot(sig2_gen[,2]/sig2_gen[,1],bm_sig2[,2]/bm_sig2[,1],
log="xy",xlim=c(1,30),ylim=c(1,30),
xlab=expression(paste("generating ",
sigma[b]^2/sigma[a]^2)),
ylab=expression(paste("brownie.lite estimated ",
sigma[b]^2/sigma[a]^2)),pch=21,bg="grey",
cex=1.5,las=1,cex.axis=0.8)
abline(a=0,b=1,lty="dotted")
mtext("b) standard workflow",line=1,adj=-0.2)
```

Cool!

This shows us just how *badly* separate (rather than joint) estimation of the discrete and continuous character evolution causes us to underestimate the true rate heterogeneity due to our discrete trait.

Finally, I earlier noted that the selection of a high rate of evolution of the discrete character, \(q\), had been critical and that I would circle back to *why* that was the case.

This was because if our discrete character changes *rarely* then, most likely, all of our stochastic maps will capture something very close to the real evolutionary history of our discrete trait and the traditional workflow will be unbiased for \(\sigma_b^2/\sigma_a^2\). (Indeed, this is exactly what I found in Revell 2013). Figure 2 from that article shows precisely this pattern.

More on all this stuff later, of course, but that’s all for now folks!

## No comments:

## Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.