Sunday, March 28, 2021

New variable rate ML method for quantitative traits in phytools

For part of a project I'm involved in, I just added a new function to the phytools package, multirateBM, that uses likelihood to fit a multi-rate Brownian motion model in which the (logarithm of the) rate evolves by a Brownian process, and every edge of the tree can consequently have a slightly different value of σ2.

The code of the function can be viewed here, and you can use multirateBM just by updating phytools from it's GitHub page with the package devtools as follows:

devtools::install_github("liamrevell/phytools")

To show how the function works, I'll first simulate some data under a variable-rate model. I'll fit my model, and then I'll compare the estimated vs. true rates.

First, here are some data:

library(phytools)
tree<-pbtree(n=26,tip.label=LETTERS)
sig2<-exp(fastBM(tree,internal=TRUE))
tt<-tree
tt$edge.length<-tt$edge.length*apply(tt$edge,1,
    function(e,x) mean(x[e]),x=sig2)
x<-fastBM(tt)

Here's a plot of the tree in which the edges have been stretched to be equal to time × the evolutionary rate on each edge:

plotTree(tt)

plot of chunk unnamed-chunk-4

We can also see this pretty easily using a contMap-style visualization:

contMap(tree,x)

plot of chunk unnamed-chunk-5

OK, now let's fit our model.

I'll do that using multirateBM as follows:

fit.ml<-multirateBM(tree,x,n.iter=3)
## Beginning optimization....
## Optimization iteration 1.
## Optimization iteration 2.
## Optimization iteration 3.
## Done optimization.
fit.ml
## Multi-rate Brownian model using multirateBM.
## 
## Fitted rates:
##         A        B       C        D       E        F        G        H     
##  1.862594 1.871276 1.65685 1.671728 1.08921 1.053945 1.068177 1.211217 ....
## 
## log-likelihood:  -54.439295 
## AIC:  212.878589 
## 
## R thinks it has found a solution.

I built a handy plot method to visualize our results. Let's use it:

plot(fit.ml,digits=3)

plot of chunk unnamed-chunk-7

Finally, we can compare our simulated & estimated σ2 values as follows:

## generating sigma^2 by edge
SIG2<-apply(tree$edge,1,function(e,x) 
    mean(x[e]),x=sig2)
## estimated sigma^2 by edge
est.SIG2<-apply(tree$edge,1,function(e,x) 
    mean(x[e]),x=fit.ml$sig2)
xylim<-range(c(SIG2,est.SIG2))
par(mar=c(5.1,5.1,4.1,1.1))
plot(SIG2,est.SIG2,pch=21,bg="grey",cex=1.5,bty="n",
    xlab=expression(paste("true rates, ",sigma^2)),
    ylab=expression(paste("estimates rates, ",sigma^2)),
    log="xy",xlim=xylim,ylim=xylim)
lines(xylim,xylim,lwd=2,
    col=make.transparent(palette()[4],0.5))
title(main="Simulated vs. Estimated Brownian Rates",
    font.main=3)

plot of chunk unnamed-chunk-8

Honestly, I was shocked (shocked!) at how well this worked with simulated data.

To see that it also works (or, at least, runs) with real data, let's load some & try it.

These are a phylogeny and body mass data for mammals.

data(mammal.tree)
data(mammal.data)
lnMass<-setNames(log(mammal.data$bodyMass),
    rownames(mammal.data))
fit.mammal<-multirateBM(mammal.tree,lnMass,n.iter=3)
## Beginning optimization....
## Optimization iteration 1.
## Optimization iteration 2.
## Optimization iteration 3.
## Done optimization.
fit.mammal
## Multi-rate Brownian model using multirateBM.
## 
## Fitted rates:
##  U._maritimus U._arctos U._americanus N._narica P._lotor M._mephitis M._meles C._lupus     
##       0.01004  0.009619      0.139263   0.02582 0.016295    0.136526 0.010344 0.190132 ....
## 
## log-likelihood:  -231.849096 
## AIC:  659.698193 
## 
## R thinks it has found a solution.

We'll plot it with a black background just for fun.

par(bg="black",fg="white")
plot(fit.mammal,fsize=0.8,digit=2,ftype="i")

plot of chunk unnamed-chunk-10

Cool.

1 comment:

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.