## Friday, April 12, 2013

### New version of tree simulator with taxa & time stops (and both)

This is not what I set out to do when I reopened the function pbtree that I use often, but have barely looked at in two years (that'll come in a future post); however last night & this morning I thought it would be neat to add a time-stop criterion to the existing taxa-stop in the stochastic pure-birth tree simulator in phytools, pbtree. Having done that, I realized it would be straightforward to just use rejection sampling to simulate simultaneously conditioning on both N & t.

Tree simulation is much less complicated than it might seem at first glance. The waiting time between speciation events on a Yule tree are exponentially distributed with the rate parameter, λ = b × m where m is the number of "open" edges. Having drawn a random waiting time, we first add the wait time to all the open edges of the tree; and then we speciate any of the open edges at random. We repeat this procedure until we reach a set number of tips or time.

One way to condition on both the total time & the number of tips is to repeatedly simulate trees under the time-stop criterion until a tree with the correct number of tips is obtained. In theory this could take a long time, even if we use a birth-rate where E[N]=eb×t. This is because the variance on stochastic pure-birth trees is very large. For instance:

> source("pbtree.R")
> # simulate 1000 trees with a time-stop
> trees<-pbtree(b=1,t=log(100)-log(2),nsim=1000)
(I use t=log(100)-log(2) because this is the amount of time we expect will result in 100 tips given b=1, on average.)
> # how many tips in each tree?
> N<-sapply(trees,function(x) length(x\$tip.label))
> mean(N)
[1] 100.499
> var(N)
[1] 4946.41
> hist(N,0:17*25,col="grey",xlab="frequency",ylab="number of tips",main=NULL)

Nonetheless, pbtree is sufficiently speedy that it is still possible to simulate using both a taxon & time-stop. Here's a demo:

> system.time(tree<-pbtree(b=1,t=log(100)-log(2),n=100))
simulating with both taxa-stop (n) and time-stop is
performed via rejection sampling & may be slow

126 trees rejected before finding a tree

user  system elapsed
0.61    0.00    0.61
> tree

Phylogenetic tree with 100 tips and 99 internal nodes.

Tip labels:
t17, t18, t69, t70, t21, t12, ...

Rooted; includes branch lengths.
Cool.

The code for the new version of pbtree is here.