Saturday, January 5, 2013

Adding new tips at random to a phylogeny

A friend and colleague contacted me recently with the request (somewhat abbreviated), below:

What I'm interested in doing is adding a given number of tips at random to a tree. I've done a comparison where I take a phylogeny. . . [and randomly prune taxa]. I'm interested to also do the reverse. . . . Would there be an easy way in phytools to do a "random addition of a given number of tips"?

Well, it is possible to randomly add tips to the tree - just by (for instance) picking nodes at random (from, say, the set of all internal & terminal nodes - excluding the root node), picking a position at random along the corresponding parent edge, and then using the phytools function bind.tip to attach the new tip to the tree.

So, for instance, we could just do:
> require(phytools)
> # simulate a random pure-birth tree
> tree<-pbtree(n=12)
> # pick a node at random
> node<-sample(c(1:length(tree$tip), 2:tree$Nnode+length(tree$tip)),size=1)
> node
[1] 22
> # pick a random position along the branch ending in node
> position<-runif(n=1)* tree$edge.length[which(tree$edge[,2]==node)]
> # this just tells us the relative position
> # (from the root)
> # check where the tip will be added
> (tree$edge.length[which(tree$edge[,2]==node)]- position)/tree$edge.length[which(tree$edge[,2]==node)]
[1] 0.700115
> # now attach new tip
> new.tree<-bind.tip(tree,"t13",where=node, position=position)
> # plot the trees
> par(mfrow=c(2,1))
> plotTree(tree,node.numbers=T)
> plotTree(new.tree,node.numbers=T)
We could do this sequentially a number of times - each time updating the tree as well as the set of nodes in the tree from which we are picking.

One shortcoming of this approach is that it ignores relative branch lengths - added tips are no more likely to be located on long branches than on short ones. Ideally, I think we'd like the probability of adding a tip along a branch to vary in direct proportion to the relative length of each branch. We can do that we taking advantage of the base function cumsum which (obviously) computes a cumulative sum from a vector. I have put this into a new function, add.random, which can be downloaded here. Let's test out the function to see if it behaves as we'd expect.
> source("add.random.R")
> # simulate a random, non-ultrametric tree
> tree1<-rtree(20)
> # set one branch to be very long (doesn't matter which)
> tree1$edge.length[sample(1:nrow(tree1$edge),size=1)]<-100
> # add new tips at random
> # here, we supply edge lengths - but we don't need to
> tree2<-add.random(tree1,n=10,edge.length=runif(n=10))
> # plot both trees
> layout(c(1,2),heights=c(2,3))
> plotTree(tree1,fsize=0.8)
> plotTree(tree2,fsize=0.8)
We can see that nearly all the new tips have been added on the very long branch that we created, which is exactly what we'd hoped.

As in the phytools function bind.tip, when the tree is ultrametric if new tips are added without branch lenghs, branch lengths are set such the tree remains ultrametric with the new tips. For example:
> # simulate pure-birth tree
> tree1<-pbtree(n=20)
> # add new tips at random, without supplying branch
> # lengths
> tree2<-add.random(tree1,n=10)
> # plot both trees
> layout(c(1,2),heights=c(2,3))
> plotTree(tree1,fsize=0.8)
> plotTree(tree2,fsize=0.8)
One thing that we can observe about the plots above is that because tips are added sequentially (not somehow all at once to the original tree), the resulting phylogeny can include entire clades (for instance, the group of ((t29,t25),t27)) not found in the original phylogeny.

Finally, we can also supply our own names, so, for instance:
> tree<-pbtree(n=12)
> tree<-add.random(tree,tips=c("huxley","darwin","wallace", "lyell"))
> plotTree(tree)
That's it!


  1. Hello Liam!

    It is a very intresting function.
    I am beginner in using R for phylogentic manipulations, so I apologize for the may be naive following question.
    Do you have any idea about how to apply this function to one or several specific nodes in a tree?



  2. Thanks for this fantastically useful post.
    I would just add one note: the trees used in this example are cladograms and so branch lengths are not a problem when adding the new tip. In order to add branches with lengths you need to add an edge.length argument to bind.tip. I chose the value at random from the original set of branch lengths.

    edgelength <- sample(tree$edge.length, 1)
    new.tree<-bind.tip(tree,"t13",edge.length = edgelength, where=node, position = position)

    Thanks again!

  3. Hey Liam, first off, thanks for the fantastic website. I reckon I owe you a beer at some point, but on to the question:
    In the add.random function, using an ultrametric tree, is there any way to control the edge lengths? Ideally I wanna constrain where (timewise) the new tips get added. I don't care where they end up relationship-wise, just want to be able to add them all say <10 million years, or <5 million years, what have you.
    Thanks again.

    1. Hi Ian.

      I just posted a solution here.

      - Liam

  4. This comment has been removed by the author.