Comments on Phylogenetic Tools for Comparative Biology: Fitting a variable-process model of discrete character evolution on the tree using phytools

To add to your discussion. I used corHMM's ray...

2017-12-06T22:40:58.189-05:00

To add to your discussion. I used corHMM's raydisk function to do an extension of the covarion model with very large Q-matrices. You can do LRT there and not only type I error but full power analyses. The pruning algorithm in corHMM is really efficient allowing you to work with large phylogenies. What I added to raydisk was an efficient yet decently accurate way to calculate exp{Qt}. I know phytools uses expm from package matrix but in the case of large and sparse Q matrices (>20) is nor efficient nor accurate.

Incidentally (and this is a cool feature added by ...

2017-12-06T15:53:26.230-05:00

Incidentally (and this is a cool feature added by Jeremy Beaulieu, not me), the rayDISC function in corHMM is one of the coolest unknown tools in phylogenetics to me. Once you convert multiple characters (say, three binary traits) into a single multistate character (000 = state 0, ..., 111 = state 7) you can create very flexible rate matrices. So you can say "I can go from 000 to 001 at the same rate as 000 to 010, but 000 -> 100 is forbidden, and...."). I bring it up here because if you had a case where you thought one trait affected evolution of another, rather than stochastically mapping one and then using that to fit regimes, you probably get a better fit by inferring the mapping and the rates jointly (so you have a matrix that has the 0->1 rate for trait 1 depending on the state of trait 2, etc.). [see https://academic.oup.com/sysbio/article/62/2/339/1668230 for why the mapping one trait then estimating rate of the other might be less good than doing it jointly. ;-)]

Beaulieu et al.'s corhmm is an extension of th...

2017-12-06T15:46:04.678-05:00

Beaulieu et al.'s corhmm is an extension of the covarion model that does what John suggests. It does rate reconstruction and ancestral state estimates at nodes (and of course it's then straightforward do make a stochastic character mapping version).

It's an interesting comparison of Liam's approach vs corhmm. Liam's requires mapping a regime on the tree, and having the rate matrix change based on the regime mapping. Corhmm doesn't require the mapping ahead of time, in effect allowing it to be inferred (the placement of the hidden state). It's like Brownie (or brownie.lite) and OUwie requiring pre-mapped regimes rather than auteur and SURFACE inferring the regime shifts [for standardized testing fans, Liam new method is to corHMM as OUwie is to SURFACE]. I can see biological use cases for both the pre-mapped regime case and the hidden regime case.

Hi John. I believe this is what Beaulieu et al. ...

2017-12-05T22:17:23.496-05:00

Hi John.

I believe this is what Beaulieu et al. (2013) published, and they specifically point out that it is a generalization of the covarion model.

Please correct me if I'm wrong, of course, but this is not the same, so far as I can tell - in the sense that the regimes we are computing the likelihood over are fixed. That means, for instance, that we might have one time period (from t=0 to t=0.25, say) on the tree that we permit to evolve under one process, than a second under a different process, and so on. That is - the regime 'paintings' are observed or specifically hypothesized a priori. By contrast in the covarion model & its guild we have an unobserved set of regimes & we compute the likelihood by integrating over the probability that each datum comes from each regime.

Note that this is not better or more sophisticated than the covarion or 'hidden-rates' model - in fact, it is less! However, it does suit a specific class of hypothesis that is common in phylogenetic comparative biology - for instance that trait evolution differs between clades or among geological eras, and allows us to contrast that hypothesis against one in which it does not differ or varies in a different way.

Thanks for pointing out the relationship to the covarion model. This is a very important literature, of course!

- Liam

This seems like an overly complicated solution. Wh...

2017-12-05T21:56:54.078-05:00

This seems like an overly complicated solution. Why not just make a covarion-like model? That is to say, you embed different 2 X 2 rate matrices into a (2 X N) X (2 X N) rate matrix, where N is the number of "regimes." The 2 X 2 rate matrices go along the diagonal. Along the off diagonal 2 X 2 areas, you have a rate of switching from one regime to another. As an aside, this is not new at all. It's an extension of the covariant model. My colleagues and I did something identical, though computationally more intensive, for selection regimes: we embedded three codon models into a 183 X 183 rate matrix, allowing switching among selection regimes.

Guindon, S., A. G. Rodrigo, K. A. Dyer, and J. P. Huelsenbeck. 2004. Modeling the site-specific variation of selection patterns along lineages. {\it Proceedings of the National Academy of Sciences, U.S.A.} 101(35):12957--12962.