Monday, July 11, 2011

Writing a "phylo" object to a Newick string

Now that we can generate stochastic character mapped trees using make.simmap(), I thought it would be useful to write them to file. Unfortunately, although I have functions that can read Newick strings from file (e.g., read.newick() and read.simmap()), I don't yet have code to go the other way (i.e., to write a Newick string from a "phylo" object stored in memory). For more details on the Newick convention, check out Felsenstein's comprehensive description.

Ok, now, remember how a "phylo" object is stored. It is a list with three essential components: $edge, a matrix containing the node or tip numbers for the upper and lower node of each edge; $tip.label, a vector containing the labels for each tip; and, finally, $Nnode, an integer value indicating the number of internal nodes in the tree. Branch lengths are stored in the optional vector $edge.length, which contains all the branch lengths in tree tree in the order of $edge.
So, for instance, the Newick tree:
(((2:1.55,((6:0.16,7:0.16):0.47,(4:0.22,5:0.22):0.41):0.92):0.24,1:1.79):0.17,(3:0.49,8:0.49):1.47);
is plotted as follows:

and is represented in memory with the following four parts:
> tree$edge
[,1] [,2]
[1,] 9 10
[2,] 10 11
[3,] 11 1
[4,] 11 12
[5,] 12 13
[6,] 13 2
[7,] 13 3
[8,] 12 14
[9,] 14 4
[10,] 14 5
[11,] 10 6
[12,] 9 15
[13,] 15 7
[14,] 15 8
> tree$tip.label
[1] "2" "6" "7" "4" "5" "1" "3" "8"
> tree$Nnode
[1] 7
> tree$edge.length
[1] 0.17 0.24 1.55 0.92 0.47 0.16 0.16 0.41 0.22 0.22 1.79 1.47 0.49 0.49


The task of back-translating the object above to a Newick string is not too difficult. In fact, if the "phylo" object is in "cladewise" order (as above), then we can nearly just descend from the top to the bottom of the matrix $edge, writing the appropriate characters at each step. The reason I qualify this as nearly is because we can move up (i.e., from root to tip) the tree very easily by descending $edge, we do not as easily move back down the tree this way - which we need to do to close parentheses and assign branch lengths to internal edges of the Newick tree. To do this, we just need to write a simple while() loop which backs down the tree every time we might be ready to close out a clade.

The code for my simple tree writer is not particularly elegant, but it seems to work:

writeTree<-function(tree){
  tree<-reorder.phylo(tree,"cladewise")
  n<-length(tree$tip)
  string<-vector(); string[1]<-"("; j<-2
  for(i in 1:nrow(tree$edge)){
    if(tree$edge[i,2]<=n){
      string[j]<-tree$tip.label[tree$edge[i,2]]; j<-j+1
      if(!is.null(tree$edge.length)){
        string[j]<-paste(c(":",round(tree$edge.length[i],10)), collapse="")
        j<-j+1
      }
      v<-which(tree$edge[,1]==tree$edge[i,1]); k<-i
      while(length(v)>0&&k==v[length(v)]){
        string[j]<-")"; j<-j+1
        w<-which(tree$edge[,2]==tree$edge[k,1])
        if(!is.null(tree$edge.length)){
          string[j]<-paste(c(":",round(tree$edge.length[w],10)), collapse="")
          j<-j+1
        }
        v<-which(tree$edge[,1]==tree$edge[w,1]); k<-w
      }
      string[j]<-","; j<-j+1
    } else if(tree$edge[i,2]>=n){
      string[j]<-"("; j<-j+1
    }
  }
  if(is.null(tree$edge.length)) string<-c(string[1:(length(string)-1)], ";")
  else string<-c(string[1:(length(string)-2)],";")
  string<-paste(string,collapse="")
  return(string)
}


Now, let's test it out against "ape"'s write.tree() for a random tree:
> tree<-rtree(n=100)
> tree$edge.length<-round(tree$edge.length,6) # to avoid precision differences
> write.tree(tree)==writeTree(tree)
[1] TRUE


Cool! Now we're ready to handle printing "simmap" trees.

6 comments:

  1. Hi! This is exactly what I was looking for!! I have a problem though.I get that the nrow of $edge are the $lengths, but I cannot visualize which numbers of the edge matrix correspond to the tree.

    Can you help me?

    Thanks!!!

    ReplyDelete
  2. whoah this blog is fantastic i really like reading your posts. Keep up the great paintings! You already know, lots of persons are searching around for this information, you could aid them greatly.
    Click on Cloud IT Zone and get your hosting Services.

    ReplyDelete
  3. Ok, now, remember how a "phylo" object is stored. It is a list with three essential components

    ReplyDelete
  4. I feel like I have a long way to go. I only started my research about coding and so. I want to change my field, so I try to understand if I'm able to went to IT section. Not that sure as far. But I already found best resume writing services and plan to make different resume for different fields, because they write the best resume and I don't want to loose my chance. But I'm not really good at coding as far, so I'll take my time to learn. That's why I'm here. And thank you all for all of your questions and answers, it helps.

    ReplyDelete
  5. It's the state of mind this infers. Yuck. Like writing is the essential wreckage the housekeeping staff can tidy up.children book and author

    ReplyDelete
  6. MAKE A DATE WITH A BUDDY AND WRITE-One of the most ideal approaches to constrain yourself to write is to make a date with a writing pal. Consolidate your public activity with your writing.best resume service

    ReplyDelete