Well, as the title of this post would suggest, the function can now do birth-death simulation, as well as pure-birth. Remember, pbtree can now condition on total time, the number of tips, or both (by rejection sampling); and can simulate in continuous or discrete time.
Simulating with death (i.e., extinction) is not too hard. For continuous time simulation, whereas before we drew our waiting times from the exponential distribution with λ = b × m for birth-rate b & number of 'open' lineages m, with extinction we draw our wait times from the exponential distribution with λ = (b + d) × m. This is because if the wait time to an event of class A is exponentially distributed with rate λA & the wait time to an event of class B is exponentially distributed with rate λB, and A & B are independent, then the wait times to A or B are exponentially distributed with λ = λA + λB. The probabilities that an event is a birth or a death given that it has occurred are just b/(b+d) and d/(b+d), respectively. This means that our procedure for growing the tree should be (1) draw a random wait time; (2) pick a lineage at random; and (3) and then have it speciate or die with the probabilities given above.
For discrete time the procedure is only slightly different. Now we draw a random wait time from the geometric distribution with p = b + d.** (**Remember that we are actually drawing m independent waiting times & then using the one or more smallest times.) However, since p is a probability (whereas λ was a rate), we are now constrained such that b + d ≤ 1.0. This is because a given lineage can speciate, go extinct, or do nothing within a time-step - but it can't do more than one of these. (b + d = 1.0 guarantees that an event of some kind - speciation or extinction - occurs in every lineage in every generation.)
Here's a little demo of the function, using continuous time simulations:
Loading required package: phytools
> # continuous time, taxa stop
> tt<-log(50)-log(2); tt
> ltt(tree,plot=FALSE)$gamma # pull of the recent
> tt # time for simulation
simulating with both taxa-stop (n) and time-stop (t) is
performed via rejection sampling & may be slow
13 trees rejected before finding a tree
> # function to get the number of extant tips
> # low extinction rate
1000 phylogenetic trees
> # high extinction rate
1000 phylogenetic trees
> # distribution of tree sizes
> hist(NlD,0:ceiling(max(c(NlD,NhD))/10)*10,col="grey", xlab="frequency",ylab="number of tips",main=NULL)
Pretty cool. The purpose of the final exercise was just to illustrate that although the expected number of lineages at time t can be computed as N = e(b-d)×t, the variance among simulations goes up with increasing b.
That's it. I've been testing out the new pbtree extensively - but please report if you are able to break it! Thanks!