Phylogenetic Tools for Comparative Biology: Fitting a variable-process model of discrete character evolution on the tree using phytools

Tuesday, December 5, 2017

Fitting a variable-process model of discrete character evolution on the tree using phytools

Now for something a little different.

Today, I have built a new method that fits a model of discrete character evolution in which the transition matrix Q varies among different parts of the tree.

These might be edges of clades specified arbitrarily by the user (for instance, using paintBranches or paintSubTree in phytools), or they could be regimes mapped onto the tree using the procedure of stochastic character mapping.

The way I did this was pretty simple. I just took the function that implements Felsenstein's famous pruning algorithm to compute the likelihood, but then I modified so that it could use a different Q for different edges. The only complication was that we might like our regime to change along an edge rather than merely at a node. To solve that, I used the phytools function map.to.singleton to convert our "simmap" object with singleton nodes and only a single regime per edge. Problem solved.

Note that this function, like fitMk, uses code for the pruning algorithm adapted from Emmanuel Paradis' ape package.

Let's try it.

First, our data:

library(phytools)
packageVersion("phytools")

## [1] '0.6.50'

plot(tree,ftype="off",colors=setNames(c("blue","red"),
    mapped.states(tree)),xlim=c(0,1.05*max(nodeHeights(tree))))
tiplabels(pie=to.matrix(x,c(0,1)),piecol=c("black","white"),
    cex=0.4,offset=0.01)

plot of chunk unnamed-chunk-1

So the idea is simply that we will fit a model in which the rate of transition between 0 & 1 (show here as black & white) depends on the state (red or blue) mapped onto the edges of the tree. Right?

Let's fit our model:

fitERmulti<-fitmultiMk(tree,x,model="ER")
fitERmulti

## Object of class "fitmultiMk".
## 
## Fitted value of Q[a]:
##           0         1
## 0 -0.646581  0.646581
## 1  0.646581 -0.646581
## 
## Fitted value of Q[b]:
##           0         1
## 0 -10.14495  10.14495
## 1  10.14495 -10.14495
## 
## Fitted (or set) value of pi:
##   0   1 
## 0.5 0.5 
## 
## Log-likelihood: -35.883658 
## 
## Optimization method used was "nlminb"

Or an "ARD" model that differs between parts of the tree:

fitARDmulti<-fitmultiMk(tree,x,model="ARD")
fitARDmulti

## Object of class "fitmultiMk".
## 
## Fitted value of Q[a]:
##           0         1
## 0 -2.218063  2.218063
## 1  0.604146 -0.604146
## 
## Fitted value of Q[b]:
##           0         1
## 0 -25.36092  25.36092
## 1  12.73165 -12.73165
## 
## Fitted (or set) value of pi:
##   0   1 
## 0.5 0.5 
## 
## Log-likelihood: -34.515745 
## 
## Optimization method used was "nlminb"

Of course, we can compare this, if we'd like, to a model with but a single regime on the tree. For instance:

fitER<-fitMk(tree,x,model="ER")
fitER

## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##           0         1
## 0 -1.651121  1.651121
## 1  1.651121 -1.651121
## 
## Fitted (or set) value of pi:
##   0   1 
## 0.5 0.5 
## 
## Log-likelihood: -41.613061 
## 
## Optimization method used was "nlminb"

This suggests that our data justifies the greater model complexity of multiple regimes on the tree. That's good, because we simulated them that way!

tree<-pbtree(n=100,tip.label=LETTERS,scale=0.5)
Q<-matrix(c(-1,1,1,-1),2,2)
rownames(Q)<-colnames(Q)<-letters[1:2]
tree<-sim.history(tree,Q,anc="a")
sim.tree<-as.phylo(tree)
q<-setNames(c(1,10),letters[1:2])
sim.tree$edge.length<-colSums(t(tree$mapped.edge[,letters[1:2]])*q)
rownames(Q)<-colnames(Q)<-0:1
x<-as.factor(sim.history(sim.tree,Q)$states)

(As of yet there is not function to simulate multiple Mk models in different parts of the tree - so what I did above is to stretch the edge lengths of the tree by regime, and simulate under a constant regime on the stretched tree.)

5 comments:

John HuelsenbeckDecember 5, 2017 at 9:56 PM
This seems like an overly complicated solution. Why not just make a covarion-like model? That is to say, you embed different 2 X 2 rate matrices into a (2 X N) X (2 X N) rate matrix, where N is the number of "regimes." The 2 X 2 rate matrices go along the diagonal. Along the off diagonal 2 X 2 areas, you have a rate of switching from one regime to another. As an aside, this is not new at all. It's an extension of the covariant model. My colleagues and I did something identical, though computationally more intensive, for selection regimes: we embedded three codon models into a 183 X 183 rate matrix, allowing switching among selection regimes.

Guindon, S., A. G. Rodrigo, K. A. Dyer, and J. P. Huelsenbeck. 2004. Modeling the site-specific variation of selection patterns along lineages. {\it Proceedings of the National Academy of Sciences, U.S.A.} 101(35):12957--12962.
ReplyDelete
Replies

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.

Pages

Tuesday, December 5, 2017

Fitting a variable-process model of discrete character evolution on the tree using phytools

5 comments: