Thursday, January 26, 2012

Function to get descendant node numbers

So, I'm working on a different project that required me to write a function to pull out the descendant node (& tip) numbers from any internal node on the tree. Even though I feel like I've done this before, I struggled coming up with something that made sense. This is what I ended up with:

getDescendants<-function(tree,node,curr=NULL){
  if(is.null(curr)) curr<-vector()
  daughters<-tree$edge[which(tree$edge[,1]==node),2]
  curr<-c(curr,daughters)
  w<-which(daughters>=length(tree$tip))
  if(length(w)>0) for(i in 1:length(w))
    curr<-getDescendants(tree,daughters[w[i]],curr)
  return(curr)
}


Here's a quick example of how it works:

> tree<-pbtree(n=10)
> tree$tip.label<-1:10
> # this lines up tip numbers & labels for simplicity
> par(mar=c(1,1,1,1)) # set margins
> plot(tree,font=1); nodelabels(bg="white")



Now we can pull the numbers for the descendant nodes & tips for any internal node. For instance:

> getDescendants(tree,node=17)
[1] 18 6 7 8
> getDescendants(tree,node=12)
[1] 16 15 4 5 2 3
> getDescendants(tree,node=19)
[1] 9 10


Anyway, seems to work, but if anyone is aware of an existing function for this, please pass it along.

17 comments:

  1. How about Descendants in phangorn? (might be interesting to compare the run times...)

    ReplyDelete
  2. Right on.

    getDescendants(...) and Descendants(...,type="all") are exactly the same. They also have virtually identical run times (which is interesting as the two functions are coded very differently.

    Would have used this if I'd known it was available!

    ReplyDelete
  3. I recently discovered a problem which was that different programs (and R functions) will number the internal nodes in different ways, which is a problem if you are trying to, say, run an analysis over 1000 Bayesian trees of possibly different topologies and extract statistics etc. for common clades in these analyses.

    Basically, there is postorder node numbering, there is the way an R phylo object does it, and there is index number counting from 1 of the way an R phylo object does it. I wrote a function to sort it out.

    (I did this several months ago and have since updated packages yadda yadda so I haven't double-checked lately.)

    Use as you like!

    Cheers,
    Nick
    matzkeATberkeley.edu

    postorder_nodes_phylo4_return_table <- function(tr4)
    {
    require(phylobase)

    # Find root row
    rootnode = nodeId(tr4, type="root")

    tipnames = descendants(tr4, rootnode, type="tips")

    tr5 = reorder(tr4, "postorder")
    tmp_edge = attr(tr5, "edge")
    tmp_edge2 = tmp_edge[tmp_edge[,2] > length(tipnames), ]
    tmp_edge2[,2]

    nodenums = tmp_edge2[,2]
    internal_nodenums = tmp_edge2[,2]-length(tipnames)
    postorder = 1:length(nodenums)

    postorder_table = cbind(nodenums, internal_nodenums, postorder)
    postorder_table = adf2(postorder_table)

    return(postorder_table)
    }




    # Test script

    master_tree = read.tree(file="", text="((((Aotearoa_magna_hw0050:57.55665505,Zearchaea_sp_hw0051:57.55665505):17.1867611,Mesarchaea_bell_hw0040_20A:74.74341615):27.71428065,(Chilarchaea_quellon_hw0029:51.72488185,Mecysmauchenius_segmentatus_hw0098:51.72488185):50.73281496):167.2431139,((Huttonia_sp_hwHutt41:177.3053354,Palpimanus_sp_hw0082:177.3053354):88.1331157,(Stenochilus_sp_hw0081:239.5191569,(Patarchaea_muralis:46.25785221,((((Myrmecarchaea_sp:52.52267389,Baltarchaea_conica:52.53323845):12.82754307,Archaea_paradoxa:65.31673034):32.81723581,Afrarchaea_grimaldii:53.14529283):38.20828846,(((Austrarchaea_nodosa_hw0074:49.40542201,Austrarchaea_davisae_hwAu066:49.40542201):26.12254178,Austrarchaea_mainae_hw0075:75.5279638):90.82062321,((Eriauchenius_workmani_hw0006:71.64058191,Eriauchenius_bourgini_hw0061:71.64058191):69.96785536,((Afrarchaea_pilgrimsrest_hw59Af3:49.13614886,Afrarchaea_woodae_hw57Af1:49.13614886):73.44416036,(NewGenus_legendrei_hw0014:82.4140833,(NewGenus_lavatenda_hw0003AB:57.30071564,NewGenus_jeanneli_hw0057:57.30071564):25.11336767):40.1662259):19.02812805):24.74014973):16.51404697):30.99101197):25.66551091):25.91929425):4.262359573);")

    master_tree4 = as(master_tree, "phylo4")

    # Specify the postorder node numbers for the
    # final collection object
    postorder_table = postorder_nodes_phylo4_return_table(master_tree4)
    postorder_table

    match_internal_nodes = match(internal_nodes, postorder_table$nodenums)
    node_histories_order_in_postorder = postorder_table$postorder[match_internal_nodes]


    # Different kinds of node numbering
    plot(master_tree)
    axisPhylo()
    nodelabels(postorder_table$nodenums[order(postorder_table$nodenums)])
    title("R default node labels")

    plot(master_tree)
    axisPhylo()
    nodelabels(postorder_table$internal_nodenums[order(postorder_table$nodenums)])
    title("R default node indices")

    plot(master_tree)
    axisPhylo()
    nodelabels(postorder_table$postorder[order(postorder_table$nodenums)])
    title("postorder node labels")

    ReplyDelete
  4. Nick. Thanks for posting. I haven't had a chance to look at this yet, but you are definitely correct that (even for the same topology & branch lengths) there is no requirement that corresponding nodes align.

    ReplyDelete
  5. Actually I should say that you need a different function for comparing different trees and identifying corresponding clades. The above just shows the node labels on a given tree under three different labeling schemes. It should just cut-paste to R. Cheers!

    ReplyDelete
  6. When one wants to deal with the same 'node' on different trees, one define nodes as being the same based on their tip descendants. prop.part() in ape becomes indispensible for this, as one can use the output to find the nodes which match on multiple trees. Obviously, we may want to define nodes somewhat differently in other cases. (For example, consider two phylogenies which do not include the same plant species but still each have a node which includes all angiosperms.)

    ReplyDelete
  7. Uh-huh, good point. That's why topological similarity is best defined by edges, i.e., splits (e.g., R-F distance).

    There is still the practical issue that (by the ape standard for a phylo object) there is no particular required numbering scheme by which topologically and rotationally identical trees have the same node numbers.

    For instance:

    tr1<-read.tree(text="((A,B),(C,D));")
    plot(tr1); nodelabels()
    tr2<-read.tree(text="((C,D),(A,B));")
    plot(tr2); nodelabels()

    and, in addition:

    tr1<-pbtree(n=10)
    plot(tr1); nodelabels()
    tr2<-read.tree(text=write.tree(tr1))
    plot(tr2); nodelabels()

    (So far as I know, pbtree produces a tree that conforms to the ape standard.)

    ReplyDelete
  8. hey,

    the function get.desc.of.node() (internal function in the auteur package) is not doing the same thing as well?

    Joan.

    ReplyDelete
  9. Hi Joan.

    No, it looks like that function just returns the two descendant nodes from a given node. This is the same as:

    desc.nodes<-tree$edge[which(tree$edge[,1]==node),2]

    - Liam

    ReplyDelete
  10. Hi Liam,

    why it works when you introduce the tree directly into a variable?

    > tr <- read.tree(text ="(((t1:1.142640805,((t3:0.3942279855,t4:0.3942279855):0.5913585637,(t9:0.0590829974,t10:0.0590829974):0.9265035519):0.1570542561):0.01949323667,(t7:0.1455523583,t8:0.1455523583):1.016581684):0.6239323756,((t5:0.3366558802,t6:0.3366558802):0.06012997056,t2:0.3967858508):1.389280567);")

    > get.descendants.of.node(12,tr)
    [1] 13 1 14 15 2 3 16 4 5 17 6 7

    sorry if it's very obvious... (Im newborn in R)

    thanks again!


    Joan.

    ReplyDelete
  11. Hi Joan.

    No, you're right. Evidently, there are two functions in auteur: get.desc.of.node and get.descendants.of.node. The former retrieves only the immediate descendants of a given node.

    I don't know how you discovered this as it is not described in the auteur documentation!

    Evidently getDescendants(tree,node), Descendants(tree,node,type="all"), and get.descendants.of.node(node,tree) are all functionally identical, although they are each coded quite differently! Good discovery.

    - Liam

    ReplyDelete
  12. oh! i can see I had misspelled it in my first post!

    it is used in one of the examples of the manual.

    cheers,


    Joan

    ReplyDelete
  13. Hello Liam,

    I am trying to get the descendant edges for each edge of a phylogeny. Have you managed to do that too?

    Here I was able to get the number of descendant tips for each edge (please, see if you agree with my solution).


    eds<-tree$edge
    nodeds<-eds[,1][order(eds[,2])]
    numtips<-numeric(length(nodeds))
    for(i in 1:length(nodeds)){
    phy<-extract.clade(tree,nodeds[i])
    numtips[i]<-length(phy$tip.label)
    }
    edge.num<-eds[,2][order(eds[,2])]
    desc.tips<-cbind(edge.num,numtips)


    However, it would be also useful to know which are the descendant edges for each edge. Do you have any idea?


    All the best,
    José Hidasi Neto

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. Ops, it seems your function already answers my question. Also, my code seems to be wrong, as it gets the number of tips for the node before each edge.

      Delete
  14. Hi - shouldn't the getDescendants() function have which(daughters>length(tree$tip)), not which(daughters>=length(tree$tip)) (i.e. use strictly greater than, not ">="?

    ReplyDelete
    Replies
    1. Yup. That's a bug. Thanks for catching that. I will fix it.

      Delete

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.