## Monday, September 9, 2013

### How to simulate using pbtree & retain only those trees that don't go extinct....

A phytools user just contacted me with the following comment about the function for birth-death tree simulation, pbtree:

"Would it be helpful to add an option such that pbtree keeps trying until a tree survives, rather than returning NULL? (I'm trying to simulate a set number of trees of a set number of taxa, a proportion of which go extinct before the end of the simulation)."

Although it would be straightforward to pop this into pbtree, it is also pretty easy to write a simple wrapper script around pbtree to the same effect. For example:

foo<-function(b,d,n,t){
while(is.null(x<-pbtree(b,d,n,t,extant.only=TRUE,
quiet=TRUE))) NULL
x
}

Let's try it out:

> trees<-replicate(100,foo(b=1,d=0.5,n=50,t=NULL), simplify=FALSE)
> class(trees)<-"multiPhylo"
> trees
100 phylogenetic trees
> sapply(trees,function(x) length(x\$tip.label))
 50 50 50 ...

In my opinion, an important caveat should be attached to this endeavor - and that is that by sampling birth-death histories conditioned on a particular outcome (survival to the present) we are only sampling from a portion of the distribution under our simulation conditions.

The effect of this is easiest to show using time-stop (rather than taxon-stop) simulations. Let's simulate with pbtree and then with our wrapper script. We'll use b, d, and t to yield an expected number of extant taxa at time t of 50.

> b<-1
> d<-0.5
> t<-log(25)/(b-d)
> t1<-pbtree(b,d,t=t,nsim=1000,extant.only=TRUE,quiet=TRUE)
> l1<-sapply(unclass(t1),function(x) length(x\$tip.label))
> t2<-replicate(1000,foo(b,d,n=NULL,t),simplify=FALSE)
> class(t2)<-"multiPhylo"
> l2<-sapply(unclass(t2),function(x) length(x\$tip.label))
> mean(l1)
 50.661
> mean(l2)
 67.524
> par(mfcol=c(2,1))
> par(mar=c(5.1,4.1,2.1,1.1))
> bb<-seq(0,max(c(l1,l2))+25,25)
> hist(l1,xlab="number of extant taxa",main=NULL,breaks=bb)