Friday, June 7, 2013

Robust Newick tree reader

phytools has a function to read simple Newick format trees (called read.newick). This function has been totally useless because it is completely redundant with read.tree in the 'ape' package. However a recent query on the R-sig-phylo email list got me wondering if there was any reason that the code from my recent major rewrite of read.simmap couldn't be harvested to re-write read.newick as a "robust" tree reader that can accept singleton nodes, etc.

First off - what is a singleton node. Most nodes in our tree have two or more descendants. A singleton node is a node with only one descendant. These are created by extra left & right parentheses in our Newick string. For example:

(((A,B),(C,D)),E);
has no singletons; whereas:
((((A),B),(C,D))),E);
has two singletons - one on the edge leading from the common ancestor of A & B to tip A, and another below the clade containing A, B, C, and D.

The code for my robust tree reader is here. Let's try to use read.tree and read.newick to read in the two trees above:

> # load source
> source("read.newick.R")
>
> # read tree
> aa<-"(((A,B),(C,D)),E);"
> t1<-read.tree(text=aa)
> t2<-read.newick(text=aa)
>
> # plot
> par(lend=2) # for plotting
> par(mar=c(0.1,0.1,3.1,0.1))
> layout(matrix(c(1,2),1,2))
> plot(t1,edge.width=3,lend=2)
> title("read.tree")
> plot(t2,edge.width=3,lend=2)
> title("read.newick")
> bb<-"((((A),B),(C,D))),E);"
> t3<-read.tree(text=bb)
Error in if (sum(obj[[i]]$edge[, 1] == ROOT) == 1 && dim(obj[[i]]$edge)[1] > :
missing value where TRUE/FALSE needed
> t4<-read.newick(text=bb)
> layout(matrix(c(2,1),1,2))
> plot(t4,edge.width=3,lend=2)
Error in plot.phylo(t4, edge.width = 3, lend = 2) :
there are single (non-splitting) nodes in your tree; you may need to use collapse.singles()
> t4<-collapse.singles(t4)
> plot(t4,edge.width=3,lend=2)
> title("read.newick")

That's it.

23 comments:

  1. Hi Dr. Revell,
    I was having the same issue with read.nexus() from ape. However, I'm trying to use
    read.newick() and it is getting stuck.
    The tree I'm using can be found here
    http://www.evoio.org/wiki/Phylotastic/Use_Cases

    It is the APG angiosperm Phylomatictree.nex. I can open it in Mesquite, but when I'm in R I'm getting stuck
    angiosperm.tree<-read.newick(file = "Phylomatictree.nex")

    Read 1610 items
    And you can see it running but never ending. Since I can't see any error, do you have any idea of what is happening? Thanks!

    ReplyDelete
    Replies
    1. Hi Rosana.

      This looks like a nexus format tree with singletons. read.newick handles singletons, but only in plain Newick (i.e., phylip) format. Is that correct?

      - Liam

      Delete
    2. I'm sure it has singletons, the tree description implies it is simple text
      file size: 65 KB, MIME type: text/plain

      Looking through the file it seems very standard, so I cannot spot a problem with the tree. Mesquite didn't have issues with it so I'll guess a I'll save it in a different format?

      Delete
    3. Send it to me by email & I'll see if I can figure out the problem getting it into R. Liam

      Delete
    4. Hello Liam, I encountered the same problem as Rosana did. Did you figure out the problem? It would be very helpful if you share the explanation/solution!

      Delete
    5. Déb. I don't remember. I think the original file was in Nexus format, which is not the correct format for read.newick. I'm not sure if this was the problem or not. If your file is a simple Newick string & you cannot read it with read.newick in the latest version of phytools (>=0.4-45) then please email it to me & I will troubleshoot. - Liam

      Delete
    6. Hello Liam. I also have the same problem as Rosana and Déb. Would you check my newick file? Sincerely, Ryohei

      Delete
    7. You'd have to email it to me ;)

      Delete
    8. Thanks, Liam. Fortunately, I could solve the problem by myself :)

      Delete
  2. Thanks for the script! I helped deal with all of the Picante R issues with my tree with lots of singleton nodes (common issue if working with tropical forest data). I can finally run the various phylodiversity metrics. Yay.

    ReplyDelete
  3. Hi Liam--I just wanted to shake your virtual hand for the incredible help your scripts have been. No reply necessary.

    Adam

    ReplyDelete
  4. Holy Cow... had a large tre that I was having fits with. your script did the trick!!! Many thanks...

    Stephen

    ReplyDelete
  5. Hello Liam.
    I am trying to read my tree in R, and I got the message that you described above. I tried your code using "read.newick" and then I collapsed the singletons using your script. I thought that it would work, but I got another ERR message when I try to plot my tree "Error in .nodeDepthEdgelength(Ntip, Nnode, z$edge, Nedge, z$edge.length) :
    NA/NaN/Inf in foreign function call (arg 6)"
    Not sure what I am doing wrong because I have run this script before on other trees and I had no problem at all. Any help would be very welcome!

    ReplyDelete
    Replies
    1. Hi Angelica.
      If you send me your tree file I can try & troubleshoot - otherwise it's hard to say as I do not recognize that error message.
      -- Liam

      Delete
  6. Hello Liam,

    I am trying to reproduce this with my own newick file but it gets rejected everytime, and r does not visualize the problem. The file is pretty big so I dont know were to start.
    Is there a way of easily checking your phylofile? Sorry I am very new to this.
    I hope you can help me.

    Sincerely
    Roos

    ReplyDelete
    Replies
    1. Hi Roos.

      Did you first try to read it in with read.tree from the ape package? read.newick is not intended to replace read.tree, it just may be able to read trees that are badly conformed in some way that cannot be read by read.tree.

      If, in fact, you need to use read.newick - you could check to see if the root edge has a branch length or node label. Search for ";" and remove everything between the laste ")" and the ";". Let us know if that works.

      - Liam

      Delete
    2. Dear liam,

      Thank you for your reaction.
      But with read.tree function i still get an error.
      Error in if (sum(obj[[i]]$edge[, 1] == ROOT) == 1 && dim(obj[[i]]$edge)[1]
      : missing value where TRUE/FALSE needed
      Even when i delete the parts you reccomended.
      I dont know what to do.
      -Roos

      Delete
    3. Hi Roos.

      I don't recognize that error message. You are welcome to try emailing me your tree file. Alternatively, you might post your question to the R-sig-phylo email list.

      All the best, Liam

      Delete
  7. I got this:

    Error in if (sum(obj[[i]]$edge[, 1] == ROOT) == 1 && dim(obj[[i]]$edge)[1] > :
    missing value where TRUE/FALSE needed

    really I tried, but it did not work. May I send you my tree?

    Thank you!

    ReplyDelete
    Replies
    1. Yes. Send me your file & R code and I will try to reproduce your error.

      All the best, Liam

      Delete
  8. I am trying to add the following species phylogram that I can open using Dendrocope tree viewer software without any problem, as the column dendrogram, to a heatmap I am trying to build using heatmap.2.

    But using read.tree does not work on this input
    And using read.newick, I can read it in, but when I collapse single nodes, the tree plot looks nothing like the original I see with Dendroscope.

    Is there a problem with my tree? Or in its format? Since it is a species phylogram, there are no branch lengths.

    Could you help please?

    ((((((((((((((Mt3.5v5, Mt4.0v1), Car), (((Pvu186, Pvu218), (Gma109, Gma189)), Cca))), (((Ppe139, Mdo196), Fve226), Csa122)), ((((((((Ath167, Aly107), (Cru83)), (Bra197, Tha173)), Cpa113), (Gra221, Tca233)), (Csi154, (Ccl165, Ccl182))), ((Mes147, Rco119),(Lus200, (Ptr156, Ptr210)))), Egr201)), Vvi145), ((Stu206, Sly225), Mgu140)), Aco195), (((Sbi79, Zma181),(Sit164, Pvi202)), (Osa193, Bdi192))), Smo91), Ppa152), (((Cre169, Vca199), Csu227), ((Mpu228, Mpu229), Olu231))));

    ReplyDelete
  9. Dear Liam,

    I tried the functions read.tree and read.newick to open a phylogenetic tree of birds. With read.tree I got the typical error (Error in if (sum(obj[[i]]$edge[, 1] == ROOT) == 1 && dim(obj[[i]]$edge)[1] > : missing value where TRUE/FALSE needed), but when I used read.newick it took really long I let it run 24h and I stil didn't get the tree.

    Do you know any other function that will work with a complex tree?

    Thanks a lot!
    Irene

    ReplyDelete