Saturday, November 7, 2015

Computing the frequency of changes along edges of a set of stochastic map trees

In response to a recent R-sig-phylo thread I thought it might be useful to post code demonstrating how to compute the relative frequency of changes of particular types along edges in a posterior sample from the method of stochastic mapping.

## load libraries
library(phytools)
## here is our data
tree
## 
## Phylogenetic tree with 26 tips and 25 internal nodes.
## 
## Tip labels:
##  A, B, C, D, E, F, ...
## 
## Rooted; includes branch lengths.
x
##   A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P   Q   R 
## "a" "a" "a" "b" "b" "b" "b" "b" "b" "a" "a" "a" "b" "a" "a" "a" "a" "a" 
##   S   T   U   V   W   X   Y   Z 
## "a" "a" "a" "b" "b" "b" "b" "b"
## obtain stochastic maps:
trees<-make.simmap(tree,x,model="ER",nsim=100)
## make.simmap is sampling character histories conditioned on the transition matrix
## 
## Q =
##           a         b
## a -2.818303  1.409151
## b  1.409151 -2.818303
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
##   a   b 
## 0.5 0.5
## Done.

Now let's do our analysis. The following function computes for a given edge (by row in tree$edge, but we could easily modify to be by node number) the relative frequency of trees in which the state at the start of the edge is "a" and the state at the end is "b".

## this function computes for a given edge
foo<-function(edge,trees,states){
    obj<-sapply(trees,function(x,e,s) 
        if(names(x$maps[[e]])[1]==s[1]&&names(x$maps[[e]])[length(x$maps[[e]])]==s[2]) TRUE
        else FALSE,e=edge,s=states)
    sum(obj)/length(obj)
}

We can use it by iterating over edges:

edge.probs<-sapply(1:nrow(tree$edge),foo,trees=trees,states=c("a","b"))
edge.probs
##  [1] 0.09 0.21 0.00 0.00 0.00 0.00 0.00 0.53 0.00 0.27 0.43 0.02 0.01 0.01
## [15] 0.00 0.00 0.00 0.00 0.01 0.00 0.01 0.01 0.00 0.00 0.02 0.00 0.97 0.00
## [29] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.13 0.05 0.00 0.00 0.00 0.00 0.00
## [43] 0.41 0.00 0.00 0.40 0.03 0.03 0.00 0.00

Finally, let's visualize the results on a 'representative' stochastic map:

plot(trees[[1]],colors=setNames(c("blue","red"),c("a","b")),
    ylim=c(-2,Ntip(tree)))
par(fg="transparent") ## just for clarity
threshold<-0 ## threshold frequency above which we want to plot it
edgelabels(edge=which(edge.probs>threshold),
    pie=edge.probs[which(edge.probs>threshold)],
    piecol=c("red","black"),cex=0.8)
par(fg="black")
add.simmap.legend(colors=setNames(c("red","black"),c("freq. change from a to b",
    "freq. no change or change from b to a")),prompt=FALSE,x=0,y=0)

plot of chunk unnamed-chunk-4

Or we can change the threshold to 0.05:

plot(trees[[1]],colors=setNames(c("blue","red"),c("a","b")),
    ylim=c(-2,Ntip(tree)))
par(fg="transparent") ## just for clarity
threshold<-0.05 ## threshold frequency above which we want to plot it
edgelabels(edge=which(edge.probs>threshold),
    pie=edge.probs[which(edge.probs>threshold)],
    piecol=c("red","black"),cex=0.8)
par(fg="black")
add.simmap.legend(colors=setNames(c("red","black"),c("freq. change from a to b",
    "freq. no change or change from b to a")),prompt=FALSE,x=0,y=0)

plot of chunk unnamed-chunk-5

That's about it.

BTW, for this example the data were simulated as follows:

tree<-pbtree(n=26,tip.label=LETTERS,scale=1)
Q<-matrix(c(-1,1,1,-1),2,2)/2
rownames(Q)<-colnames(Q)<-letters[1:2]
x<-sim.history(tree,Q,anc="a")$states

No comments:

Post a Comment