A phytools user recently contacted me about the function multirateBM
that implements the variable-rate continuous trait evolution model using penalized-likelihood method that is originally described in Revell (2022).
Their inquiry had multiple parts, but I think it essentially boiled down to “what is the minimum sample size [i.e., number of tips] for this method?” Specifically, they wanted to know whether this method needed trees with hundreds of operational taxa, thousands, or dozens.
This is actually quite a bit harder of a question than it might seem… or maybe its not hard at all. I would say that the answer is “it depends.” Specifically, it depends on how large the effect is of your rate heterogeneity.
To see what I mean, let’s simulate a very modest-sized tree containing just 24 taxa as follows:
library(phytools)
tree<-pbtree(n=24,tip.label=letters[24:1])
plotTree(tree,lwd=1)
nodelabels(cex=0.6)
Next, let’s simulate a high rate of evolution, say for the clade descended from node labeled 28
in this plot, and a base rate 10 lower of throughout the rest of the tree.
tt<-paintSubTree(tree,28,state="1",anc.state="0")
x<-sim.rates(tt,setNames(c(1,10),0:1))
par(mar=c(5.1,4.1,2.1,2.1))
cols<-setNames(c("blue","red"),0:1)
phenogram(tt,x,las=1,cex.axis=0.8,fsize=0.6,
colors=cols,lwd=1)
legend("topleft",c(expression(paste(sigma^2,' = 1')),
expression(paste(sigma^2,' = 10'))),
col=cols,lty="solid",bty="n")
Now let’s see if we can detect this rate shift using multirateBM
.
I bet we can!
obj<-multirateBM(tree,x)
## Beginning optimization....
## Optimization iteration 1. Using "L-BFGS-B" optimization method.
## Best (penalized) log-likelihood so far: -44.1409
## Optimization iteration 2. Using "Nelder-Mead" optimization method.
## Best (penalized) log-likelihood so far: -44.1409
## Optimization iteration 3. Using "BFGS" optimization method.
## Best (penalized) log-likelihood so far: -44.1401
## Done optimization.
obj
## Multi-rate Brownian model using multirateBM.
##
## Fitted rates:
## x w v u t s r q
## 1.223448 10.03531 8.792052 7.74407 9.012035 7.655695 7.338579 8.853072 ....
##
## lambda penalty term: 1
## log-likelihood: -41.923968
## AIC: 179.847936
##
## R thinks it has found a solution.
plot(obj,ftype="off",lwd=5)
Yes! (And it’s not even close.)
Indeed, even with the default and a single rate shift on a small tree, we’re getting very close to our generating conditions.
Since I just used the default smoothing parameter value, , let’s try and (while realizing that this is not very rigorous and that in practice I would probably recommend some sort of cross-validation procedure be used to select rather than just picking values arbitrarily).
fit_lambda0.1<-multirateBM(tree,x,lambda=0.1)
## Beginning optimization....
## Optimization iteration 1. Using "L-BFGS-B" optimization method.
## Best (penalized) log-likelihood so far: -39.2331
## Optimization iteration 2. Using "Nelder-Mead" optimization method.
## Best (penalized) log-likelihood so far: -39.2331
## Optimization iteration 3. Using "BFGS" optimization method.
## Best (penalized) log-likelihood so far: -39.2331
## Done optimization.
fit_lambda0.1
## Multi-rate Brownian model using multirateBM.
##
## Fitted rates:
## x w v u t s r q
## 0.022254 24.94561 9.068857 13.80581 19.14336 6.017043 4.117849 8.499844 ....
##
## lambda penalty term: 0.1
## log-likelihood: -36.919468
## AIC: 169.838936
##
## R thinks it has found a solution.
fit_lambda10<-multirateBM(tree,x,lambda=10)
## Beginning optimization....
## Optimization iteration 1. Using "L-BFGS-B" optimization method.
## Best (penalized) log-likelihood so far: -52.8556
## Optimization iteration 2. Using "Nelder-Mead" optimization method.
## Best (penalized) log-likelihood so far: -52.8556
## Optimization iteration 3. Using "BFGS" optimization method.
## Best (penalized) log-likelihood so far: -52.7846
## Done optimization.
fit_lambda10
## Multi-rate Brownian model using multirateBM.
##
## Fitted rates:
## x w v u t s r q
## 4.161493 6.482993 6.378398 6.166163 6.311752 6.434874 6.402634 6.66946 ....
##
## lambda penalty term: 10
## log-likelihood: -45.330136
## AIC: 186.660272
##
## R thinks it has found a solution.
par(mfrow=c(1,2))
plot(fit_lambda0.1,mar=c(1.1,1.1,3.1,1.1))
mtext(expression(paste("a) ",lambda," = 0.1")),line=1,
adj=0)
plot(fit_lambda10,mar=c(1.1,1.1,3.1,1.1))
mtext(expression(paste("b) ",lambda," = 10")),line=1,
adj=0)
The point is that it seems we don’t need a very large tree for multirateBM
to successfully identify rate heterogeneity – so long as the effect size (that is, the variation in rate among clades or edges) is large!
No comments:
Post a Comment
Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.