For part of a project I'm involved in, I just added a new function to the
phytools package, multirateBM
, that uses likelihood to fit a multi-rate
Brownian motion model in which the (logarithm of the) rate evolves by a
Brownian process, and every edge of the tree can consequently have a
slightly different value of σ2.
The code
of the function can be viewed
here, and
you can use multirateBM
just by updating phytools from it's GitHub page
with the package devtools as follows:
devtools::install_github("liamrevell/phytools")
To show how the function works, I'll first simulate some data under a variable-rate model. I'll fit my model, and then I'll compare the estimated vs. true rates.
First, here are some data:
library(phytools)
tree<-pbtree(n=26,tip.label=LETTERS)
sig2<-exp(fastBM(tree,internal=TRUE))
tt<-tree
tt$edge.length<-tt$edge.length*apply(tt$edge,1,
function(e,x) mean(x[e]),x=sig2)
x<-fastBM(tt)
Here's a plot of the tree in which the edges have been stretched to be equal to time × the evolutionary rate on each edge:
plotTree(tt)
We can also see this pretty easily using a contMap
-style visualization:
contMap(tree,x)
OK, now let's fit our model.
I'll do that using multirateBM
as follows:
fit.ml<-multirateBM(tree,x,n.iter=3)
## Beginning optimization....
## Optimization iteration 1.
## Optimization iteration 2.
## Optimization iteration 3.
## Done optimization.
fit.ml
## Multi-rate Brownian model using multirateBM.
##
## Fitted rates:
## A B C D E F G H
## 1.862594 1.871276 1.65685 1.671728 1.08921 1.053945 1.068177 1.211217 ....
##
## log-likelihood: -54.439295
## AIC: 212.878589
##
## R thinks it has found a solution.
I built a handy plot
method to visualize our results. Let's use it:
plot(fit.ml,digits=3)
Finally, we can compare our simulated & estimated σ2 values as follows:
## generating sigma^2 by edge
SIG2<-apply(tree$edge,1,function(e,x)
mean(x[e]),x=sig2)
## estimated sigma^2 by edge
est.SIG2<-apply(tree$edge,1,function(e,x)
mean(x[e]),x=fit.ml$sig2)
xylim<-range(c(SIG2,est.SIG2))
par(mar=c(5.1,5.1,4.1,1.1))
plot(SIG2,est.SIG2,pch=21,bg="grey",cex=1.5,bty="n",
xlab=expression(paste("true rates, ",sigma^2)),
ylab=expression(paste("estimates rates, ",sigma^2)),
log="xy",xlim=xylim,ylim=xylim)
lines(xylim,xylim,lwd=2,
col=make.transparent(palette()[4],0.5))
title(main="Simulated vs. Estimated Brownian Rates",
font.main=3)
Honestly, I was shocked (shocked!) at how well this worked with simulated data.
To see that it also works (or, at least, runs) with real data, let's load some & try it.
These are a phylogeny and body mass data for mammals.
data(mammal.tree)
data(mammal.data)
lnMass<-setNames(log(mammal.data$bodyMass),
rownames(mammal.data))
fit.mammal<-multirateBM(mammal.tree,lnMass,n.iter=3)
## Beginning optimization....
## Optimization iteration 1.
## Optimization iteration 2.
## Optimization iteration 3.
## Done optimization.
fit.mammal
## Multi-rate Brownian model using multirateBM.
##
## Fitted rates:
## U._maritimus U._arctos U._americanus N._narica P._lotor M._mephitis M._meles C._lupus
## 0.01004 0.009619 0.139263 0.02582 0.016295 0.136526 0.010344 0.190132 ....
##
## log-likelihood: -231.849096
## AIC: 659.698193
##
## R thinks it has found a solution.
We'll plot it with a black background just for fun.
par(bg="black",fg="white")
plot(fit.mammal,fsize=0.8,digit=2,ftype="i")
Cool.
This comment has been removed by the author.
ReplyDelete