
The first thing I realized was that it would not be practical to use the same "work-around" algorithm that I used for v0.1 of read.simmap() and described here. I was going to have to figure out how to read phylogenies into memory in R without being able to take advantage of the {ape} function read.tree()!
To convey what this problem involves, it is first important to understand how a phylogenetic tree is stored in memory in R by {ape}. read.tree() stores the phylogeny as list of class "phylo" with 3 basic elements, as follows:
$Nnode : this is an integer containing the number of internal nodes in the tree;
$tip.label : this is a vector containing the labels for all the tips in the tree; and finally
$edge : this is a matrix of dimensions ($Nnode+length(tip.label)-1 x 2) which contains the node numbers for all the internal parent (left column) and daughter (right column) nodes in the tree.
By convention in {ape}, terminal nodes are numbered 1:n for n species, where the numbers correspond to the order of the tip labels in $tip.label; and internal nodes are labeled (n+1):($Nnode+length(tip.label)) starting with the root node of the tree.
[Note that this is really different from how phylogeny storage is typically programmed in C; where we would use a network of linked custom data types called "structures. An example of this can be seen on my website, here.]
To give you an idea of what this would look like then, the following tree (see figure) in Newick format:
>tree<-read.tree(text="(((Human,Chimp),Gorilla),Monkey);")
would have the following elements:
> tree$Nnode
[1] 3
> tree$tip.label
[1] "Human" "Chimp" "Gorilla" "Monkey"
> tree$edge
[,1] [,2]
[1,] 5 6
[2,] 6 7
[3,] 7 1
[4,] 7 2
[5,] 6 3
[6,] 5 4
More on this very soon. . . . .
No comments:
Post a Comment
Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.