Monday, September 9, 2013

How to simulate using pbtree & retain only those trees that don't go extinct....

A phytools user just contacted me with the following comment about the function for birth-death tree simulation, pbtree:

"Would it be helpful to add an option such that pbtree keeps trying until a tree survives, rather than returning NULL? (I'm trying to simulate a set number of trees of a set number of taxa, a proportion of which go extinct before the end of the simulation)."

Although it would be straightforward to pop this into pbtree, it is also pretty easy to write a simple wrapper script around pbtree to the same effect. For example:

    quiet=TRUE))) NULL

Let's try it out:

> trees<-replicate(100,foo(b=1,d=0.5,n=50,t=NULL), simplify=FALSE)
> class(trees)<-"multiPhylo"
> trees
100 phylogenetic trees
> sapply(trees,function(x) length(x$tip.label))
  [1] 50 50 50 ...

In my opinion, an important caveat should be attached to this endeavor - and that is that by sampling birth-death histories conditioned on a particular outcome (survival to the present) we are only sampling from a portion of the distribution under our simulation conditions.

The effect of this is easiest to show using time-stop (rather than taxon-stop) simulations. Let's simulate with pbtree and then with our wrapper script. We'll use b, d, and t to yield an expected number of extant taxa at time t of 50.

> b<-1
> d<-0.5
> t<-log(25)/(b-d)
> t1<-pbtree(b,d,t=t,nsim=1000,extant.only=TRUE,quiet=TRUE)
> l1<-sapply(unclass(t1),function(x) length(x$tip.label))
> t2<-replicate(1000,foo(b,d,n=NULL,t),simplify=FALSE)
> class(t2)<-"multiPhylo"
> l2<-sapply(unclass(t2),function(x) length(x$tip.label))
> mean(l1)
[1] 50.661
> mean(l2)
[1] 67.524
> par(mfcol=c(2,1))
> par(mar=c(5.1,4.1,2.1,1.1))
> bb<-seq(0,max(c(l1,l2))+25,25)
> hist(l1,xlab="number of extant taxa",main=NULL,breaks=bb)
> title(main="a) no rejection",adj=0,cex.main=1)
> hist(l2,xlab="number of extant taxa",main=NULL,breaks=bb)
> title(main="b) reject extinct trees",adj=0,cex.main=1)

So, you can see that when we reject all the simulations that went extinct before the present, our average number of extant taxa is much larger than the expected value of 50. This may be irrelevant, depending on the purpose of our simulation - but it is something that can be kept in mind.

No comments:

Post a Comment