Earlier today I posted a new version of the phytools pure-birth tree simulator: pbtree. This version is neat because it allows you to condition on the total number of tips in the tree, the total time, or (via rejection sampling) both.

However, when I started looking at pbtree it was actually because I wanted to add in discrete-time simulations, in addition to the continuous-time tree simulations already available in pbtree. (There is at least some interest in this.) Unlike discrete-time continuous character simulations, such as in the visualization tool bmPlot, simulating trees under a discrete-time model is not the same as simulating every time-step from 0 to the total time of the simulation. Rather, it just means we draw our waiting times between speciation events from a different probability distribution.

For continuous-time simulations we draw waiting times from an exponential distribution in which the rate parameter, λ, is equal to the speciation rate × the number of "open" edges at the last event. For discrete-time simulations, we draw waiting times from the discrete analog - the geometric distribution - which describes the probability that the first success in a series of Bernoulli trials occurs in trial *x*. Since the outcome of a discrete-time speciation process can be viewed as a Bernoulli trial (i.e., a lineage speciates, or it does not, in every time-step), this is the correct distribution of waiting times under our model.

There is a little wrinkle, here, though. Do you recall that to get the next wait time given *m* open edges we simulated rexp(n=1,lambda=b*m)? Well, in discrete-time the analogy is imperfect. What we need to do instead is simulate rgeom(n=m,prob=b) and then the wait-time to the next even is the minimum of that. Why? This is because we can think about the *m* open lineages as *m* different series of Bernoulli trials, and we want to find the wait time to the next event on *any* lineage in the tree.

*In addition*, we have to allow for the possibility that there are multiple speciation events within a single time-step - as long as all take place on different branches. To do this, we need to get the *set* of minimal rgeom(n=m,prob=b), rather than just a single minimum.

Now that we have a waiting time or set of (equal) waiting times, we just add a new node to any of the "open" edges in the tree, all with equal probability. We do that until we reach *n* or *t*. For discrete-time we sometimes have a problem when our stop criterion is the number of tips, not time. This is that sometimes multiple speciation events will occur within a timestep. The current version of the function allows this and spits a warning if the number of tips in the simulated tree are different from the stopping criterion.

The code for this updated version of pbtree is here; however since it uses an internal phytools function, to try it out you should install the latest version of phytools from source (phytools 0.2-43). Here's a quick demo:

Loading required package: phytools

> packageVersion("phytools")

[1] ‘0.2.43’

>

> ## this is our speciation prob to get 100 tips after

> ## 1000 time-steps, on average in discrete time

> b<-exp((log(100)-log(2))/1000)-1

>

> # simulate

> trees<-pbtree(b=b,t=1000,type="discrete",nsim=2000)

> N<-sapply(trees,function(x) length(x$tip.label))

> mean(N)

[1] 98.7815

> var(N)

[1] 4848.484

> range(N)

[1] 3 595

> hist(N,0:24*25,col="grey",xlab="frequency",ylab="number of tips",main=NULL)

As before, we can also condition on both *N* and *t*; however, since this is done via rejection sampling, it is probably wise (at least for illustrative purposes here) to choose a reasonable combination of b, n, and t. For instance:

> tree<-pbtree(b=b,n=100,t=1000,type="discrete")

simulating with both taxa-stop (n) and time-stop (t) is

performed via rejection sampling & may be slow

41 trees rejected before finding a tree

> plotTree(tree,ftype="off")

Cool. That's it for now.

## No comments:

## Post a Comment