Wednesday, February 7, 2024

Testing for a difference in the rate of evolution along a single (or set of single) branches in a phylogenetic tree using phytools brownie.lite

The other day a friend & colleague contacted me about an article of mine, published way back in 2008, describing an alternative to the older McPeek (1995) technique designed to test hypotheses about a shift in the rate of evolution for a quantitative character only single branches of a phylogenetic tree. I didn’t devise a new method – I simply applied O’Meara et al. (2006), still a great article IMO – to this older problem. (Even so, looking back at this article that I must’ve written in what couldn’t have been more than my 3rd or 4th year of grad school – I’m kinda impressed!)

Anyway, this colleague asked if that method was implemented in R, and I said “yes” – in the form of the phytools function brownie.lite which implements O’Meara et al. (2006).

To show how this might be used, let’s image rate-heterogeneous evolution on the phylogeny of salamanders:

set.seed(1)
library(phytools)
packageVersion("phytools")
## [1] '2.1.10'
data(salamanders)
plotTree(salamanders,direction="upwards",ftype="i")
nodelabels(frame="circle",cex=0.6)

plot of chunk unnamed-chunk-51

Next, let’s envision a scenario in which we hypothesize that the rate of evolution in some trait – say, body length or genome size – was much higher on the single edge leading to the node labeled 37 than anywhere else in tree tree. I don’t actually have a character that might possess this feature – but it’s easy to simulate such data.

sim_tree<-salamanders
ind<-which(sim_tree$edge[,2]==37)
sim_tree$edge.length[ind]<-sim_tree$edge.length[ind]*20
x<-fastBM(sim_tree)
par(mar=c(5.1,4.1,1.1,1.1))
phenogram(salamanders,x,fsize=0.6,ftype="i",las=1,
  cex.axis=0.5)
nodelabels(frame="circle",cex=0.5)

plot of chunk unnamed-chunk-52

(In practice, of course, our hypothesis for rate heterogeneity should not be motivated by visualizing a project of the data that might reveal how that heterogeneity is distributed!)

To fit a model in which the edge leading to node 37 is permitted to evolve with a different rate than all the other edges of the tree, we need to paint just that edge with a separate “regime.” This is done most easily using the phytools function paintBranches as follows.

regime_tree<-paintBranches(salamanders,edge=37,
  state="fast",anc.state="slow")
regime_tree
## 
## Phylogenetic tree with 26 tips and 25 internal nodes.
## 
## Tip labels:
## 	neomexicanus, larselli, vandykei, idahoensis, vehiculum, dunni, ...
## 
## The tree includes a mapped, 2-state discrete character
## with states:
## 	fast, slow
## 
## Rooted; includes branch lengths.

(Here I called the painted regime "fast" and the base regime "slow" – simply because I know that’s the nature of the conditions we generated!)

Let’s verify that we got the regime right.

cols<-setNames(c("lightblue","darkred"),c("slow","fast"))
plot(regime_tree,cols,lwd=3,direction="upwards",ftype="i",
  split.vertical=TRUE)
nodelabels(frame="circle",cex=0.6)
legend("bottomright",names(cols),lwd=3,col=cols,bty="n")

plot of chunk unnamed-chunk-54

This look good, so let’s fit our model with brownie.lite.

brownie.lite(regime_tree,x)
## ML single-rate model:
##            s^2        se        a k      logL
## value 1.128097 0.3128771 3.409282 2 -100.7862
## 
## ML multi-rate model:
##       s^2(slow)  se(slow) s^2(fast) se(fast)         a k      logL
## value 0.7394925 0.2091774  33.07437  49.0846 -1.994768 3 -96.65244
## 
## P-value (based on X^2): 0.0040362 
## 
## R thinks it has found the ML solution.

This tells us that a model in which the branch leading to node 37 has a different rate than the rest of the tree is much better supported than one in which all the edges evolve under the same value of \(\sigma^2\).

In our case, since this tree of salamanders is quite modest in size, it was easy enough for us to identify the edges of the tree that we hypothesized for the “fast” regime; however, on a larger tree we might take advantage of a function such as findMRCA to identify the index of that node. For example, we can see from our plot that node 37 is also the common ancestor of the tips labeled fourchensis, glutinosus, and wehrlei. The function findMRCA can help us to identify that node index. For instance:

node<-findMRCA(salamanders,c("fourchensis","glutinosus","wehrlei"))
node
## [1] 37

Finally, if we had more than branch that we hypothesized had evolved in a different regime, this presents us with no trouble. We just map both of those edges to the derived regime and then proceed with our analysis just as before. For example, if our hypothesis was that node only the edge leading to node 37 but also the node leading to edge 32 had evolved with a high rate compared to the background, we could create these regimes as follows.

regime_tree<-paintBranches(salamanders,edge=c(32,37),
  state="fast",anc.state="slow")
cols<-setNames(c("lightblue","darkred"),c("slow","fast"))
plot(regime_tree,cols,lwd=3,direction="upwards",ftype="i",
  split.vertical=TRUE)
nodelabels(frame="circle",cex=0.6)
legend("bottomright",names(cols),lwd=3,col=cols,bty="n")

plot of chunk unnamed-chunk-57

Then proceed to fit our model in the same way as before.

brownie.lite(regime_tree,x)
## ML single-rate model:
##            s^2        se        a k      logL
## value 1.128097 0.3128771 3.409282 2 -100.7862
## 
## ML multi-rate model:
##       s^2(slow)  se(slow) s^2(fast) se(fast)         a k      logL
## value 0.7666674 0.2220412  15.55359  17.3318 -3.144391 3 -97.96764
## 
## P-value (based on X^2): 0.0175848 
## 
## R thinks it has found the ML solution.

In this case this model too is significant – even though only edge 37 and not edge 32 was evolved with a higher than background \(\sigma^2\). This is a good reminder that (as in all ‘normal’ statistical methods too) a significant model fit only tells us that our alternative hypothesis better explains our data than the null – not that it’s actually correct!

That’s all folks.

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.