Friday, July 31, 2015

Simulating an arbitrary shift in the diversification rate on a phylogeny

A R-sig-phylo reader recently asked if it is possible to simulate a shift in the rate of lineage diversification in some part of a tree (rather than simultaneously across all the lineages in a tree at a given time).

Now, I'll first say that I understand that methods to simulate various state-based models of diversification exist in the diversitree R package maintained by Rich Fitzjohn. If, however, one just wants to arbitrarily pick a point in the tree & switch to a different diversification rate (speciation and/or extinction rate) then this can also be done by merely simulating a complete tree under one process, going to the spot where you want the rate to change, pruning out the subtree tipward of that point, simulating a new subtree under your new desired process, and the pasting the two trees together.

Since this is easier said than done in R, the code below shows an example in which I first simulate a base tree with a birth rate of about b~0.02, a taxon-stop criterion of 20 and a time stop of 100; then I (arbitrarily, just to demonstrate) pick a random lineage at 75% of the total tree height and call that my shift point; I prune that lineage out, and attach a new subtree in which I use a much higher birth-rate of about b~0.12.

Here's what that looks like:

library(phytools)
## set the birth rate based on n=20 & t=100
b1<-(log(20)-log(2))/100
b1
## [1] 0.02302585
tree<-pbtree(b=b1,t=100,n=20)
## simulating with both taxa-stop (n) and time-stop (t) is
## performed via rejection sampling & may be slow
## 
##   4 trees rejected before finding a tree
plotTree(tree,mar=c(3.1,0.1,0.1,0.1))
axis(1)
nodelabels()

plot of chunk unnamed-chunk-1

## pick a split point at 75% of total tree height
pp<-0.75
H<-nodeHeights(tree)
node<-sample(tree$edge[(H[,1]<pp*max(H))+
    (H[,2]>pp*max(H))==2,2],1)
node
## [1] 28
## split the tree & this point
tree<-splitTree(tree,split=list(node=node,
    bp=pp*max(H)-H[which(tree$edge[,2]==node),1]))[[1]]
tree$tip.label[tree$tip.label=="NA"]<-""
plotTree(tree,mar=c(3.1,0.1,0.1,0.1))
axis(1)
nodelabels(node=tree$tip.label=="",pch=19)

plot of chunk unnamed-chunk-2

## set the second desired birth-rate based on n=25 & t=25
## note this time we want to start with 1 lineage, not 2
b2<-log(25)/(max(H)-pp*max(H))
## simulate the "stem" of the new subtree
t1<-(max(H)-pp*max(H))
while(t1>=25) t1<-rexp(n=1,rate=b2)
## add the stem
ii<-which(tree$edge[,2]==which(tree$tip.label==""))
tree$edge.length[ii]<-tree$edge.length[ii]+t1
plotTree(tree,mar=c(3.1,0.1,0.1,0.1))
axis(1)
nodelabels(node=tree$tip.label=="",pch=19)

plot of chunk unnamed-chunk-3

## now simulate the rest of the derived subtree
t2<-pbtree(b=b2,t=max(nodeHeights(tree))-
    nodeheight(tree,which(tree$tip.label=="")),n=25)
## simulating with both taxa-stop (n) and time-stop (t) is
## performed via rejection sampling & may be slow
## 
##   103 trees rejected before finding a tree
t2$tip.label<-paste("s",1:Ntip(t2),sep="")
tree<-bind.tree(tree,t2,where=which(tree$tip.label==""))
plotTree(tree,fsize=0.8,mar=c(3.1,0.1,0.1,0.1))
axis(1)

plot of chunk unnamed-chunk-4

## visualize the shift in diversification rate
ltt(tree,show.tree=TRUE)
## Object of class "ltt" containing:
## 
## (1) A phylogenetic tree with 42 tips and 41 internal nodes.
## 
## (2) Vectors containing the number of lineages (ltt) and branching times (times) on the tree.
## 
## (3) A value for Pybus & Harvey's "gamma" statistic of 2.8428, p-value = 0.0045.
obj<-get("last_plot.phylo",envir=.PlotPhyloEnv)
points(x=pp*max(H),y=obj$yy[findMRCA(tree,t2$tip.label)],pch=19)
lines(rep(pp*max(H),2),par()$usr[3:4],lty="dashed")

plot of chunk unnamed-chunk-5

That's it.

Obviously, this would have to be customized to whatever scenario the user is envisioning.

Thursday, July 30, 2015

Edge labels on a radial or fan tree

Today on R-sig-phylo a reader reported a bug in the way that edge labels are plotted using the ape function edgelabels.

This error is very easy to reproduce:

set.seed(8) ## for repeatability
library(phytools)
tree<-pbtree(n=26,tip.label=LETTERS)
plotTree(tree,type="fan")
## setEnv=TRUE for this type is experimental. please be patient with bugs
edgelabels()

plot of chunk unnamed-chunk-1

The problem is fairly clear to see. Instead of plotting the edge labels mid-way along each edge, the function is plotting them midway between the daughter node of the edge - and the daughter node of the parent edge. This does not work (except for edges descended directly from the root node).

[Another peculiarity of the function is that the plotted numbers are, by default, the rows of phy$edge - not, for instance, the index of the daughter node of each edge. Let's ignore that for now.]

Luckily, Klaus Schliep, a postdoc presently working in my lab & author of the phangorn package quickly leapt to the rescue with a complete bug fix. However, a little too late I also had written a little code which, though not as complete as Klaus's fix, nonetheless might be instructive (with comments below) on how these functions work.

Here is my function, which by default plots numbers corresponding to the daughter of each edge - not the row of phy$edge:

edgelabels.fan<-function(text=NULL,edge=NULL,cex=1){
    ## get the last plotted "phylo" object - created by 
    ## plot.phylo, plotTree, and other methods
    obj<-get("last_plot.phylo",envir=.PlotPhyloEnv)
    ## these are the positions of the nodes & tips in the plotted
    ## tree
    nx<-obj$xx[obj$edge[,2]]
    ny<-obj$yy[obj$edge[,2]]
    ## this function computes the length of each edge from the
    ## object we pulled, above
    foo<-function(e,x,y) sqrt(x[e[2]]^2+y[e[2]]^2)-
        sqrt(x[e[1]]^2+y[e[1]]^2)
    l<-apply(obj$edge,1,foo,x=obj$xx,y=obj$yy)
    ## now let's get the x & y coordinates of the midpoint
    ## of each edge
    x<-nx-sign(nx)*0.5*l*cos(atan(ny/nx))
    y<-ny-sign(nx)*0.5*l*sin(atan(ny/nx))
    ## some bookkeeping
    ii<-order(as.numeric(names(x)))
    x<-x[ii]
    y<-y[ii]
    if(is.null(edge)) edge<-c(1:obj$Ntip,2:obj$Nnode+obj$Ntip)
    x<-x[as.character(edge)]
    y<-y[as.character(edge)]
    if(is.null(text)) text<-names(x)
    if(length(text)!=length(edge)){
        cat("Warning: edge & text should be the same length.\n\n")
        text<-as.character(edge)
    }
    ## let's plot our edge labels
    symbols(x,y,inches=FALSE,bg="white",add=TRUE,
        rectangles=cbind(1.2*cex*strwidth(text),
        1.4*cex*strheight(text)))
    text(text,x=x,y=y,cex=cex)
}

OK, now we might as well try it out:

plotTree(tree,type="fan")
## setEnv=TRUE for this type is experimental. please be patient with bugs
edgelabels.fan()

plot of chunk unnamed-chunk-3

or, alternatively:

plotTree(tree,type="fan")
## setEnv=TRUE for this type is experimental. please be patient with bugs
edgelabels.fan(edge=c(29,32,36,47),
    text=c("clade A","clade B","clade C","clade D"))

plot of chunk unnamed-chunk-4

for instance.

That's all.