Friday, July 8, 2011

Bug fix in plotSimmap()

Rafael Maia has identified an error in plotSimmap() that sometimes results in plotted trees that look like this:

Yikes!

The error is most easily replicated with the following example. Let's take this simple Newick style tree containing 8 taxa:

(((2:1.55,((6:0.16,7:0.16):0.47,(4:0.22,5:0.22):0.41):0.92):0.24,1:1.79):0.17,(3:0.49,8:0.49):1.47);

If we read this tree into memory using read.tree(), then the tips of the tree will be numbered from left to right in the Newick tree, i.e.:

> tree$edge
      [,1] [,2]
 [1,]    9   10
 [2,]   10   11
 [3,]   11    1
 [4,]   11   12
 [5,]   12   13
 [6,]   13    2
 [7,]   13    3
 [8,]   12   14
 [9,]   14    4
[10,]   14    5
[11,]   10    6
[12,]    9   15
[13,]   15    7
[14,]   15    8
> tree$tip.label
[1] "2" "6" "7" "4" "5" "1" "3" "8"


I naively assumed that this would always be true for trees in "cladewise" order, but in fact this is not the case. Let's read the same tree from "Nexus" format, which in this case might look something like this:

#NEXUS
BEGIN TAXA;
  DIMENSIONS NTAX = 8;
  TAXLABELS
    1
    2
    3
    4
    5
    6
    7
    8
  ;
END;
BEGIN TREES;
  TRANSLATE
    1  1,
    2  2,
    3  3,
    4  4,
    5  5,
    6  6,
    7  7,
    8  8
  ;
  TREE *, UNTITLED = [&R] (((2:1.55,((6:0.16,7:0.16):0.47,(4:0.22,5:0.22):0.41):0.92):0.24,1:1.79):0.17,(3:0.49,8:0.49):1.47);
END;


If we read this file in using read.nexus(), we end up with tips that are numbered according to the translation table - not the left-right order in the Newick string, i.e.,

> tree$edge
      [,1] [,2]
 [1,]    9   10
 [2,]   10   11
 [3,]   11    2
 [4,]   11   12
 [5,]   12   13
 [6,]   13    6
 [7,]   13    7
 [8,]   12   14
 [9,]   14    4
[10,]   14    5
[11,]   10    1
[12,]    9   15
[13,]   15    3
[14,]   15    8
> tree$tip.label
[1] "1" "2" "3" "4" "5" "6" "7" "8"


The issue with plotting arises because I assumed that the tip numbers were numerically ordered according the left-to-right order of the Newick string and simply evenly spaced the tips on the basis of this numerical order (which results in branch crossing when the tips are not ordered this way, as evidenced by the figure above). Luckily, this was not difficult to solve. I merely changed the line of code in which I assigned vertical coordinates to all the tips from:

Y[1:n]<-1:n

to:

Y[cw$edge[cw$edge[,2]<=length(cw$tip),2]]<-1:n

which seems to do the trick, as if we replot the tree above, our branch crossing is solved:


I have posted the fixed version on my R phylogenetics page (direct link to code here) and I will incorporate this into the next alpha release of "phytools."

1 comment:

  1. By chance Joe Felsenstein noted yesterday on his PHYLIP facebook page that this year marks the 25th anniversary of the Newick tree file format:

    "Two weeks ago (26 June) was the 25th anniversary of the Newick tree file format. It was invented by Chris Meacham for a tree plotting program that was the ancestor of Drawtree. It was formalized by a committee called by me, which met on 25 and 26 June, 1986 and had the second meeting at Newick's seafood restaurant in Dover, New Hampshire. . . ."

    ReplyDelete