This is not what I set out to do when I reopened the function pbtree that I use often, but have barely looked at in two years (that'll come in a future post); however last night & this morning I thought it would be neat to add a time-stop criterion to the existing taxa-stop in the stochastic pure-birth tree simulator in phytools, pbtree. Having done that, I realized it would be straightforward to just use rejection sampling to simulate simultaneously conditioning on both N & t.
Tree simulation is much less complicated than it might seem at first glance. The waiting time between speciation events on a Yule tree are exponentially distributed with the rate parameter, λ = b × m where m is the number of "open" edges. Having drawn a random waiting time, we first add the wait time to all the open edges of the tree; and then we speciate any of the open edges at random. We repeat this procedure until we reach a set number of tips or time.
One way to condition on both the total time & the number of tips is to repeatedly simulate trees under the time-stop criterion until a tree with the correct number of tips is obtained. In theory this could take a long time, even if we use a birth-rate where E[N]=eb×t. This is because the variance on stochastic pure-birth trees is very large. For instance:
> # simulate 1000 trees with a time-stop
> N<-sapply(trees,function(x) length(x$tip.label))
> hist(N,0:17*25,col="grey",xlab="frequency",ylab="number of tips",main=NULL)
Nonetheless, pbtree is sufficiently speedy that it is still possible to simulate using both a taxon & time-stop. Here's a demo:
simulating with both taxa-stop (n) and time-stop is
performed via rejection sampling & may be slow
126 trees rejected before finding a tree
user system elapsed
0.61 0.00 0.61
Phylogenetic tree with 100 tips and 99 internal nodes.
t17, t18, t69, t70, t21, t12, ...
Rooted; includes branch lengths.
The code for the new version of pbtree is here.