Saturday, February 12, 2011

Building a tree in memory

Ok, as discussed in my last post, our basic goal is to read a Newick style tree [for example, the tree (((Human,Chimp),Gorilla),Monkey);] into computer memory. We would like to do this by parsing the character string, element-by-element, to build a representation of the phylogeny that consists of nodes and edge. Of course this functionality already exists in the form of the {ape} function read.tree(); however, if we can develop a parsing function, we can then adapt this function to simultaneously read in other kinds of information - for instance, the order and time spent in each character in a phylogenetic stochastic character map.

Before showing how this can be done to create the {ape} "phylo" object discussed in the last post, I thought it would be a good idea to present the basic algorithm that is used to parse a Newick tree. I'm not sure if this handy set of rules has ever been published, but it was conveyed to me verbally by Luke Harmon a number of years ago.

The basic algorithm reads the Newick tree "character-by-character" (actually, we will read node labels as if a single character), with the following rule.

Starting with the root node, if we encounter:
1) "(" -> we will create left daughter node;
2) "," -> we will go back to the parent node and create right daughter;
3) ")" -> we will go back to the parent;
4) ";" -> we will finish; and, finally
5) if we encounter a character string [not including special characters listed above], then we label the node with that string.

To be a little clearer, I made a simple animation for the (((Human,Chimp),Gorilla),Monkey); tree from last time. The animation goes through each character of the Newick string at the top of the screen, and creates nodes and edges according the rules above. Enjoy.

video

No comments:

Post a Comment