Wednesday, February 6, 2013

Reordering the columns of $mapped.edge for a set of stochastically mapped trees

I recently got the following comment to a recent post on the blog:

I am trying to play with your make.simmap function, but I found that in all the simulations, colnames(tree$mapped.edge) show a different order. This is problematic when I want to combine the results for models fitted over the multiples trees as done for instance with OUwie package.

It is true that the column order of $mapped.edge may differ in different mapped stochastic character histories. The matrix $mapped.edge is a matrix containing (in each row) the total time spent in each state (in columns) for a stochastic-map style tree stored by read.simmap or created by (for instance) make.simmap or sim.history. (The specific sequence and time spent in each state along each branch is stored in a separate component of the simmap object, a list called $maps).

The matrix $mapped.edge exists primarily to serve downstream functions, such as brownie.lite or evol.vcv for which the only attribute that's important about the mapping is the time spent in each state on each edge, regardless of the ordering.

The fact that the column orderings can be different in a set of mappings is not an oversight. I at some point made the arbitrary decision that the column order of $mapped.edge should be the order in which the discrete trait appears in the tree - from the root forwards through my upward tree traversal algorithm. Thus the first column of $mapped.edge will always be the discrete state mapped to the root of the tree in question - but the second column will be determined based on both the height and position in the tree, relative to my traversal path from the root to the tips.

I decided to use this ordering instead of, say, the alphabetical or numerical order of the mapped discrete character states because we are not assuming any natural ordering for the trait.

I feel nearly certain that in a prior post I have addressed how to reorganize the output of, say, brownie.lite to average the estimated rates from different mappings across mapping in which the columns of $mapped.edge (and thus the output order of the rates) differ among trees. However, an alternative is to reorder the columns of each matrix $mapped edge in each tree object prior to analysis. Here is an illustration of the "problem" and the code to fix it:

> # load phytools
> require(phytools)
> # ok lets simulate some data to illustrate the problem
> Q<-matrix(c(-2,1,1,1,-2,1,1,1,-2),3,3)
> rownames(Q)<-colnames(Q)<-c("A","B","C")
> tree<-pbtree(n=100,scale=1)
> # our discrete character
> X<-sim.history(tree,Q)$states
> # our stochastically mapped trees
> mtrees<-make.simmap(tree,X,nsim=100)
> cols<-c("red","blue","green"); names(cols)<-c("A","B","C")
> layout(matrix(1:100,10,10))
> plotSimmap(mtrees,cols,ftype="off",pts=FALSE)
Waiting to confirm page change...

We can see from this plot that at, although the majority of trees appear to have state "B" (blue) as the root state, at least trees 1, 3, and 4 (among others) have different root node states and thus will have different orderings of the matrix $mapped.edge. Let's confirm this:
> mtrees[[1]]$mapped.edge
        state
                   C            A            B
 101,102 0.101357902 5.474078e-05 0.0000000000
 102,103 0.000000000 2.644562e-01 0.0000000000
 103,104 0.007755547 2.602131e-01 0.0000000000
 104,1   0.340476206 2.568633e-02 ...
 ...
> mtrees[[3]]$mapped.edge
        state
                    A            B            C
 101,102 0.1014126423 0.0000000000 0.0000000000
 102,103 0.2644561833 0.0000000000 0.0000000000
 103,104 0.1063332481 0.1616353949 0.0000000000
 104,1   0.0000000000 0.2404987635 ...
 ....
> mtrees[[4]]$mapped.edge
        state
                    B           A            C
 101,102 0.1014126423 0.000000000 0.0000000000
 102,103 0.2644561833 0.000000000 0.0000000000
 103,104 0.1886379373 0.079330706 0.0000000000
 104,1   0.0000000000 0.102319207 ...
 ....

Ok - now let's try sorting the columns of $mapped.edge. Note that no downstream functions (in phytools, at least) that use this style of tree assume any specific ordering for the columns of $mapped.edge, so this change should not affect the function of the object.

Here's our code (modify as appropriate):
# pick an ordering
# (this could also be the ordering of e.g. the first tree)
ordering<-c("A","B","C")
# function to reorder
foo<-function(tree,ordering){
  tree$mapped.edge<-tree$mapped.edge[,ordering]
  tree
}
# apply to all trees
mtrees<-lapply(mtrees,foo,ordering=ordering)
class(mtrees)<-"multiPhylo"

Now let's check our three trees from before:
> mtrees[[1]]$mapped.edge
        state
                    A            B           C
 101,102 5.474078e-05 0.0000000000 0.101357902
 102,103 2.644562e-01 0.0000000000 0.000000000
 103,104 2.602131e-01 0.0000000000 0.007755547
 104,1   2.568633e-02 0.0000000000 ...
 ....
> mtrees[[3]]$mapped.edge
        state
                    A            B            C
 101,102 0.1014126423 0.0000000000 0.0000000000
 102,103 0.2644561833 0.0000000000 0.0000000000
 103,104 0.1063332481 0.1616353949 0.0000000000
 104,1   0.0000000000 0.2404987635 ...
 ....
> mtrees[[4]]$mapped.edge
        state
                   A            B            C
 101,102 0.000000000 0.1014126423 0.0000000000
 102,103 0.000000000 0.2644561833 0.0000000000
 103,104 0.079330706 0.1886379373 0.0000000000
 104,1   0.102319207 0.0000000000 ...
 ....

Good, it works. (Note that we could have also just as easily done this with a simple for loop.)

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.