Thursday, September 21, 2017

Simulating trees with specific values of Pybus & Harvey's γ

A colleague recently asked me the following question titled:

“Simulating trees with different pybus & Harvey's γ.”

“This may be a question with a very obvious answer, but is there a way of simulating Yule trees so they are either early burst or late burst (i.e. can you create simulated trees with a high or low γ)? Trying to satisfy a reviewer’s request to test our taxonomy paper against null models.”

Well, we can't precisely simulate constant-rate pure-birth trees that have a particular γ value; however, one thing we could do is transform the edge lengths of the tree so that has a particular, predictable value of γ. In fact, it turns out to be the case - at least so far as I can tell (though I don't have a formal proof of this) that for any tree & particular value of γ, there is a corresponding “EB” (early-burst or ACDC, see Blomberg et al. 2003) edge-length transformation that will produce that very-same γ. It's just a matter of finding it!

Turns out - that's pretty easy. We can use numerical optimization, and since we only need to optimize in one-dimension, as the EB transformation only has one paramter, r, optimization is both fast & robust.

For instance:

library(phytools)
## simulate a tree
tree<-pbtree(n=100)
ltt(tree,show.tree=TRUE,log.lineages=F,log="y")

plot of chunk unnamed-chunk-1

## Object of class "ltt" containing:
## 
## (1) A phylogenetic tree with 100 tips and 99 internal nodes.
## 
## (2) Vectors containing the number of lineages (ltt) and branching times (times) on the tree.
## 
## (3) A value for Pybus & Harvey's "gamma" statistic of 2.0956, p-value = 0.0361.
## transform to gamma=0
ebTree<-phytools:::ebTree
gamma<-function(r,tree,g) 
    (g-ltt(ebTree(tree,r),plot=FALSE)$gamma)^2
fit<-optimize(gamma,c(-10,10),tree=tree,g=0,tol=1e-12)
ltt(ebTree(tree,fit$minimum),show.tree=TRUE,log.lineages=F,log="y",
    lwd=2)

plot of chunk unnamed-chunk-1

## Object of class "ltt" containing:
## 
## (1) A phylogenetic tree with 100 tips and 99 internal nodes.
## 
## (2) Vectors containing the number of lineages (ltt) and branching times (times) on the tree.
## 
## (3) A value for Pybus & Harvey's "gamma" statistic of 0, p-value = 1.
## or how about gamma=10
fit<-optimize(gamma,c(-10,10),tree=tree,g=10,tol=1e-12)
ltt(ebTree(tree,fit$minimum),show.tree=TRUE,log.lineages=F,log="y",
    lwd=2)

plot of chunk unnamed-chunk-1

## Object of class "ltt" containing:
## 
## (1) A phylogenetic tree with 100 tips and 99 internal nodes.
## 
## (2) Vectors containing the number of lineages (ltt) and branching times (times) on the tree.
## 
## (3) A value for Pybus & Harvey's "gamma" statistic of 10, p-value = 0.
## or gamma=-10
fit<-optimize(gamma,c(-10,10),tree=tree,g=-10,tol=1e-12)
ltt(ebTree(tree,fit$minimum),show.tree=TRUE,log.lineages=F,log="y",
    lwd=2)

plot of chunk unnamed-chunk-1

## Object of class "ltt" containing:
## 
## (1) A phylogenetic tree with 100 tips and 99 internal nodes.
## 
## (2) Vectors containing the number of lineages (ltt) and branching times (times) on the tree.
## 
## (3) A value for Pybus & Harvey's "gamma" statistic of -10, p-value = 0.

Note that the ebTree transformation, though it does produce the correct γ, has the side-effect of changing the total depth of the tree. This needn't be the case. For instance, we can just do the following:

## custom EB function to hold total length constant
EB<-function(tree,r){
    d<-max(nodeHeights(tree))
    tree<-ebTree(tree,r)
    tree$edge.length<-tree$edge.length/max(nodeHeights(tree))*d
    tree
}
gamma<-function(r,tree,g) 
    (g-ltt(EB(tree,r),plot=FALSE)$gamma)^2
## try with gamma=-4
fit<-optimize(gamma,c(-10,10),tree=tree,g=-4,tol=1e-12)
ltt(EB(tree,fit$minimum),show.tree=TRUE,log.lineages=F,log="y",
    lwd=2)

plot of chunk unnamed-chunk-2

## Object of class "ltt" containing:
## 
## (1) A phylogenetic tree with 100 tips and 99 internal nodes.
## 
## (2) Vectors containing the number of lineages (ltt) and branching times (times) on the tree.
## 
## (3) A value for Pybus & Harvey's "gamma" statistic of -4, p-value = 1e-04.

OK, now let's generate 100 trees with total length of 1.0 and γ=-1:

trees<-pbtree(n=100,scale=1,nsim=100)
foo<-function(tree,g)
    optimize(gamma,c(-10,10),tree=tree,g=g,tol=1e-12)$minimum
r<-sapply(trees,foo,g=-1)
ttrees<-mapply(EB,tree=trees,r=r,SIMPLIFY=FALSE)
class(ttrees)<-"multiPhylo"
g<-sapply(ttrees,function(x) ltt(x,plot=FALSE)$gamma)
g
##   [1] -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
##  [24] -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
##  [47] -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
##  [70] -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
##  [93] -1 -1 -1 -1 -1 -1 -1 -1
ltt(ttrees,col=make.transparent("blue",0.25))

plot of chunk unnamed-chunk-1

## 100 objects of class "ltt" in a list

What's neat is that for our single fixed value of γ = -1.0, the LTT curves vary quite a lot!

[Note. In the original post I called ltt(trees,...) rather than ltt(ttrees,...) - that is, ltt on the original, rather than transformed trees. Shown above this is fixed, but the LTT plots still vary quite a bit.]

That's it. Hopefully of some help.

No comments:

Post a Comment