Tuesday, February 15, 2011

Adding branch lengths

In a few different prior posts (1,2,3) I described the development of a simple R function, read.newick(), that is available from my development page here. This function just reads a single Newick style tree from a character string or from file, and then creates an {ape} "phylo" object in memory. The purpose of developing this function is to add the capability of reading stochastic character map information from within the Newick tree (a problem that I tackled a very different way in a prior post). I've also provided the code online so that readers can see the R implementation of our tree reading algorithm presented in a prior blog post.

Today I just added the reading of branch lengths to my previous function. This seems to work fine, and it is easy to see how I did this by reviewing my code - to which I have also added some comments.

Branch lengths in a Newick style tree are given after a full colon following either a label or a ")" right bracket. Thus, the branch lengths for our simplified primate tree might be: (((Human:7,Chimp:7):3,Gorilla:10):15,Monkey:25);, with branch lengths in millions of years (more or less). When our parser encounters the character ":", it just needs to assign the real number following the ":" to the edge preceding the node we are on. Otherwise, we just follow our algorithm as previously described.

The only amendment to the structure of the tree in memory is a vector, $edge.length, containing the length of each branch where the order corresponds to the order of the edges in $edge. Thus, for the tree (((Human:7,Chimp:7):3,Gorilla:10):15,Monkey:25);, we would get the following:

> temp<-"(((Human:7,Chimp:7):3,Gorilla:10):15,Monkey:25);"
> phy<-read.newick(text=temp) # or use read.tree()
> phy$Nnode
[1] 3
> phy$tip.label
[1] "Human" "Chimp" "Gorilla" "Monkey"
> phy$edge
[,1] [,2]
[1,] 5 6
[2,] 6 7
[3,] 7 1
[4,] 7 2
[5,] 6 3
[6,] 5 4
> phy$edge.length
[1] 15 3 7 7 10 25

No comments:

Post a Comment