## Sunday, August 17, 2014

### Applying extract.clade.simmap to a set of trees in a list

A phytools user contacted me today to ask if there was some way to apply extract.clade.simmap to a set of stochastic mapped trees. This is actually surprisingly straightforward. To demonstrate this let me first (characteristically) simulate a tree, data, & set of stochastic maps:

library(phytools)
set.seed(1)
Q<-matrix(c(-1,1,1,-1),2,2)
rownames(Q)<-colnames(Q)<-letters[1:2]
tree<-pbtree(n=26,tip.label=LETTERS[26:1],scale=1)
x<-sim.history(tree,Q)\$states
plotTree(tree)
nodelabels()


OK, let's generate a set of stochastic maps & then extract the clade descended from node 40 from each map:

mtrees<-make.simmap(tree,x,nsim=100)

## make.simmap is sampling character histories conditioned on the transition matrix
## Q =

##        a      b
## a -1.152  1.152
## b  1.152 -1.152

## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =

##   a   b
## 0.5 0.5

## Done.

trees<-lapply(mtrees,extract.clade.simmap,node=40)
class(trees)<-"multiPhylo"
trees

## 100 phylogenetic trees

describe.simmap(trees)

## 100 trees with a mapped discrete character with states:
##  a, b
##
## trees have 5.5 changes between states on average
##
## changes are of the following types:
##       a,b  b,a
## x->y 2.33 3.17
##
## mean total time spent in each state is:
##           b      a total
## raw  1.8065 1.0676 2.874
## prop 0.6285 0.3715 1.000

## just to visualize this result
densityMap(trees,lwd=4,outline=TRUE)

## sorry - this might take a while; please be patient


The same technique can be applied to drop.tip.simmap.

If our trees differed in topology, as when we supply a posterior sample of trees from Bayesian phylogeny inference to make.simmap, the problem become a little trickier. In that case we might have to instead supply the node number for each subtree to extract as the MRCA of the set of tips we are interested in. E.g.,

tips<-LETTERS[1:13]
tips

##  [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M"

foo<-function(tree,tips) extract.clade.simmap(tree,findMRCA(tree,tips))
trees<-lapply(mtrees,foo,tips=tips)
class(trees)<-"multiPhylo"
describe.simmap(trees)

## 100 trees with a mapped discrete character with states:
##  a, b
##
## trees have 5.5 changes between states on average
##
## changes are of the following types:
##       a,b  b,a
## x->y 2.33 3.17
##
## mean total time spent in each state is:
##           b      a total
## raw  1.8065 1.0676 2.874
## prop 0.6285 0.3715 1.000


That's it.