Phylogenetic Tools for Comparative Biology: Function to get descendant node numbers

Thursday, January 26, 2012

Function to get descendant node numbers

So, I'm working on a different project that required me to write a function to pull out the descendant node (& tip) numbers from any internal node on the tree. Even though I feel like I've done this before, I struggled coming up with something that made sense. This is what I ended up with:

getDescendants<-function(tree,node,curr=NULL){
  if(is.null(curr)) curr<-vector()
  daughters<-tree$edge[which(tree$edge[,1]==node),2]
  curr<-c(curr,daughters)
  w<-which(daughters>=length(tree$tip))
  if(length(w)>0) for(i in 1:length(w))
    curr<-getDescendants(tree,daughters[w[i]],curr)
  return(curr)
}

Here's a quick example of how it works:

> tree<-pbtree(n=10)
> tree$tip.label<-1:10
> # this lines up tip numbers & labels for simplicity
> par(mar=c(1,1,1,1)) # set margins
> plot(tree,font=1); nodelabels(bg="white")

Now we can pull the numbers for the descendant nodes & tips for any internal node. For instance:

> getDescendants(tree,node=17)
[1] 18 6 7 8
> getDescendants(tree,node=12)
[1] 16 15 4 5 2 3
> getDescendants(tree,node=19)
[1] 9 10

Anyway, seems to work, but if anyone is aware of an existing function for this, please pass it along.

17 comments:

dwbapstJanuary 26, 2012 at 9:48 PM
How about Descendants in phangorn? (might be interesting to compare the run times...)
ReplyDelete
Replies
Liam RevellJanuary 26, 2012 at 11:05 PM
Right on.

getDescendants(...) and Descendants(...,type="all") are exactly the same. They also have virtually identical run times (which is interesting as the two functions are coded very differently.

Would have used this if I'd known it was available!
ReplyDelete
Replies
NickMJanuary 27, 2012 at 12:47 AM
I recently discovered a problem which was that different programs (and R functions) will number the internal nodes in different ways, which is a problem if you are trying to, say, run an analysis over 1000 Bayesian trees of possibly different topologies and extract statistics etc. for common clades in these analyses.

Basically, there is postorder node numbering, there is the way an R phylo object does it, and there is index number counting from 1 of the way an R phylo object does it. I wrote a function to sort it out.

(I did this several months ago and have since updated packages yadda yadda so I haven't double-checked lately.)

Use as you like!

Cheers,
Nick
matzkeATberkeley.edu

postorder_nodes_phylo4_return_table <- function(tr4)
{
require(phylobase)

# Find root row
rootnode = nodeId(tr4, type="root")

tipnames = descendants(tr4, rootnode, type="tips")

tr5 = reorder(tr4, "postorder")
tmp_edge = attr(tr5, "edge")
tmp_edge2 = tmp_edge[tmp_edge[,2] > length(tipnames), ]
tmp_edge2[,2]

nodenums = tmp_edge2[,2]
internal_nodenums = tmp_edge2[,2]-length(tipnames)
postorder = 1:length(nodenums)

postorder_table = cbind(nodenums, internal_nodenums, postorder)
postorder_table = adf2(postorder_table)

return(postorder_table)
}

# Test script

master_tree = read.tree(file="", text="((((Aotearoa_magna_hw0050:57.55665505,Zearchaea_sp_hw0051:57.55665505):17.1867611,Mesarchaea_bell_hw0040_20A:74.74341615):27.71428065,(Chilarchaea_quellon_hw0029:51.72488185,Mecysmauchenius_segmentatus_hw0098:51.72488185):50.73281496):167.2431139,((Huttonia_sp_hwHutt41:177.3053354,Palpimanus_sp_hw0082:177.3053354):88.1331157,(Stenochilus_sp_hw0081:239.5191569,(Patarchaea_muralis:46.25785221,((((Myrmecarchaea_sp:52.52267389,Baltarchaea_conica:52.53323845):12.82754307,Archaea_paradoxa:65.31673034):32.81723581,Afrarchaea_grimaldii:53.14529283):38.20828846,(((Austrarchaea_nodosa_hw0074:49.40542201,Austrarchaea_davisae_hwAu066:49.40542201):26.12254178,Austrarchaea_mainae_hw0075:75.5279638):90.82062321,((Eriauchenius_workmani_hw0006:71.64058191,Eriauchenius_bourgini_hw0061:71.64058191):69.96785536,((Afrarchaea_pilgrimsrest_hw59Af3:49.13614886,Afrarchaea_woodae_hw57Af1:49.13614886):73.44416036,(NewGenus_legendrei_hw0014:82.4140833,(NewGenus_lavatenda_hw0003AB:57.30071564,NewGenus_jeanneli_hw0057:57.30071564):25.11336767):40.1662259):19.02812805):24.74014973):16.51404697):30.99101197):25.66551091):25.91929425):4.262359573);")

master_tree4 = as(master_tree, "phylo4")

# Specify the postorder node numbers for the
# final collection object
postorder_table = postorder_nodes_phylo4_return_table(master_tree4)
postorder_table

match_internal_nodes = match(internal_nodes, postorder_table$nodenums)
node_histories_order_in_postorder = postorder_table$postorder[match_internal_nodes]

# Different kinds of node numbering
plot(master_tree)
axisPhylo()
nodelabels(postorder_table$nodenums[order(postorder_table$nodenums)])
title("R default node labels")

plot(master_tree)
axisPhylo()
nodelabels(postorder_table$internal_nodenums[order(postorder_table$nodenums)])
title("R default node indices")

plot(master_tree)
axisPhylo()
nodelabels(postorder_table$postorder[order(postorder_table$nodenums)])
title("postorder node labels")
ReplyDelete
Replies
Liam RevellJanuary 27, 2012 at 4:14 PM
Nick. Thanks for posting. I haven't had a chance to look at this yet, but you are definitely correct that (even for the same topology & branch lengths) there is no requirement that corresponding nodes align.
ReplyDelete
Replies
NickMJanuary 27, 2012 at 8:40 PM
Actually I should say that you need a different function for comparing different trees and identifying corresponding clades. The above just shows the node labels on a given tree under three different labeling schemes. It should just cut-paste to R. Cheers!
ReplyDelete
Replies
dwbapstJanuary 29, 2012 at 12:39 PM
When one wants to deal with the same 'node' on different trees, one define nodes as being the same based on their tip descendants. prop.part() in ape becomes indispensible for this, as one can use the output to find the nodes which match on multiple trees. Obviously, we may want to define nodes somewhat differently in other cases. (For example, consider two phylogenies which do not include the same plant species but still each have a node which includes all angiosperms.)
ReplyDelete
Replies
Liam RevellJanuary 29, 2012 at 3:46 PM
Uh-huh, good point. That's why topological similarity is best defined by edges, i.e., splits (e.g., R-F distance).

There is still the practical issue that (by the ape standard for a phylo object) there is no particular required numbering scheme by which topologically and rotationally identical trees have the same node numbers.

For instance:

tr1<-read.tree(text="((A,B),(C,D));")
plot(tr1); nodelabels()
tr2<-read.tree(text="((C,D),(A,B));")
plot(tr2); nodelabels()

and, in addition:

tr1<-pbtree(n=10)
plot(tr1); nodelabels()
tr2<-read.tree(text=write.tree(tr1))
plot(tr2); nodelabels()

(So far as I know, pbtree produces a tree that conforms to the ape standard.)
ReplyDelete
Replies
AnonymousJanuary 31, 2012 at 4:16 AM
hey,

the function get.desc.of.node() (internal function in the auteur package) is not doing the same thing as well?

Joan.
ReplyDelete
Replies
Liam RevellJanuary 31, 2012 at 9:30 AM
Hi Joan.

No, it looks like that function just returns the two descendant nodes from a given node. This is the same as:

desc.nodes<-tree$edge[which(tree$edge[,1]==node),2]

- Liam
ReplyDelete
Replies
AnonymousJanuary 31, 2012 at 7:15 PM
Hi Liam,

why it works when you introduce the tree directly into a variable?

> tr <- read.tree(text ="(((t1:1.142640805,((t3:0.3942279855,t4:0.3942279855):0.5913585637,(t9:0.0590829974,t10:0.0590829974):0.9265035519):0.1570542561):0.01949323667,(t7:0.1455523583,t8:0.1455523583):1.016581684):0.6239323756,((t5:0.3366558802,t6:0.3366558802):0.06012997056,t2:0.3967858508):1.389280567);")

> get.descendants.of.node(12,tr)
[1] 13 1 14 15 2 3 16 4 5 17 6 7

sorry if it's very obvious... (Im newborn in R)

thanks again!

Joan.
ReplyDelete
Replies
Liam RevellJanuary 31, 2012 at 8:13 PM
Hi Joan.

No, you're right. Evidently, there are two functions in auteur: get.desc.of.node and get.descendants.of.node. The former retrieves only the immediate descendants of a given node.

I don't know how you discovered this as it is not described in the auteur documentation!

Evidently getDescendants(tree,node), Descendants(tree,node,type="all"), and get.descendants.of.node(node,tree) are all functionally identical, although they are each coded quite differently! Good discovery.

- Liam
ReplyDelete
Replies
AnonymousFebruary 1, 2012 at 4:58 AM
oh! i can see I had misspelled it in my first post!

it is used in one of the examples of the manual.

cheers,

Joan
ReplyDelete
Replies
José Hidasi NetoJuly 26, 2013 at 3:09 PM
Hello Liam,

I am trying to get the descendant edges for each edge of a phylogeny. Have you managed to do that too?

Here I was able to get the number of descendant tips for each edge (please, see if you agree with my solution).

eds<-tree$edge
nodeds<-eds[,1][order(eds[,2])]
numtips<-numeric(length(nodeds))
for(i in 1:length(nodeds)){
phy<-extract.clade(tree,nodeds[i])
numtips[i]<-length(phy$tip.label)
}
edge.num<-eds[,2][order(eds[,2])]
desc.tips<-cbind(edge.num,numtips)

However, it would be also useful to know which are the descendant edges for each edge. Do you have any idea?

All the best,
José Hidasi Neto
ReplyDelete
Replies
UnknownJune 10, 2014 at 5:29 AM
Hi - shouldn't the getDescendants() function have which(daughters>length(tree$tip)), not which(daughters>=length(tree$tip)) (i.e. use strictly greater than, not ">="?
ReplyDelete
Replies

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.

Pages

Thursday, January 26, 2012

Function to get descendant node numbers

17 comments: