Monday, May 13, 2013

New version of rerootingMethod & estimating ancestral states with an ordered model

I just posted a new version of rerootingMethod. This function computes the empirical Bayes posterior probabilities (marginal ancestral state reconstructions) for a discrete character using the re-rooting method of Yang (e.g., Yang 2006). The main functional change in this version is that now the function can take a model index matrix (i.e., a matrix with the same dimensions as Q with integers to specify the different elements to be estimated), instead of just the string "ER" or "SYM". This should allow users to specify any arbitrary symmetric model - not just the full symmetric or equal-rates model.

Our index matrix contains an integer for each rate to be estimated - but values set to zero indicate that the corresponding transition rate should be zero. This caused a little bit of a problem for rerootingMethod because it had the code:

Q<-matrix(YY$rates[YY$index.matrix],ncol(XX),ncol(XX),
 dimnames=list(colnames(XX),colnames(XX)))

This code first creates a vector with the elements of the index matrix; and then coerces that vector into a matrix with dimensions specified by our number of states for the character. The problem is that index matrix elements that are zero will cause the length of the vector to be wrong, so that it cannot be coerced into a matrix with the right dimensions (we're kind of lucky that this is the case - or we might have missed the error). To solve this, I just did:

Q<-matrix(c(0,YY$rates)[YY$index.matrix+1],ncol(XX),
 ncol(XX),dimnames=list(colnames(XX),colnames(XX)))

One of the most common symmetric models other than the equal rates model that we might be interested in is probably the ordered model. This is a model in which we, say, allow transition ABCD, but no other types of transitions are allowed. How can we fit this model? Here's a quick demo in which I simulate under the model & then fit it & estimate ancestral states:

> require(phytools)
Loading required package: phytools
> packageVersion("phytools")
[1] ‘0.2.63’
> tree<-pbtree(n=100,scale=2)
> Q<-matrix(c(-1,1,0,0,1,-2,1,0,0,1,-2,1,0,0,1,-1),4,4)
> colnames(Q)<-rownames(Q)<-LETTERS[1:4]
> Q
   A  B  C  D
A -1  1  0  0
B  1 -2  1  0
C  0  1 -2  1
D  0  0  1 -1
> x<-sim.history(tree,Q)$states
> mm<-Q; diag(mm)<-0
> mm
  A B C D
A 0 1 0 0
B 1 0 1 0
C 0 1 0 1
D 0 0 1 0
> aa<-rerootingMethod(tree,x,model=mm)
> plot(tree,no.margin=TRUE,edge.width=2, show.tip.label=FALSE)
> nodelabels(pie=aa$marginal.anc,piecol=palette()[1:4], cex=0.5)
> tiplabels(pie=to.matrix(x,colnames(aa$marginal.anc)), piecol=palette()[1:4],cex=0.3)

Code for the new version of rerootingMethod is here; and the latest phytools build can be downloaded here.

Friday, May 10, 2013

New version of contMap

I just posted a new version of contMap (described here and in my recent MEE article). contMap uses a color scale to map the observed and reconstructed values of a continuous character onto the branches of a phylogeny.

This update just brings contMap into alignment with densityMap in that it (invisibly) returns an object of class "contMap" so that adjusting the plotting parameters can be accomplished without recomputing all the ancestral states.

As I demonstrated on Travis Ingram's recommendation, this - in addition to speeding up re-plotting if we need to change the parameters of our plot - allows us to easily change the palette of colors used to map our trait on the tree. For instance, we can easily change to grayscale. Here I show how we can flip low to high in our color map.

> require(phytools)
Loading required package: phytools
> packageVersion("phytools")
[1] ‘0.2.61’
> tree<-pbtree(n=30)
> x<-fastBM(tree)
> maps<-contMap(tree,x)

Alright, now let's flip the colors:

> maps$cols[]<-maps$cols[length(maps$cols):1]
> plot(maps)

Pretty cool.

Code is here; new phytools build (0.2-61), here.

paintSubTree doesn't actually "paint" the tree (i.e., with colors)

I've been emailing back & forth with a phytools user to try & explain what is done with paintSubTree, plotSimmap, and phylomorphospace with a mapped regime - but I thought some visuals would help & that some other readers (occasional or otherwise) of this blog might find the clarification useful, so I decided to write this quick post.

The first point is that paintSubTree does not paint the tree with colors. The term paint refers to regime painting in Butler & King (2004); i.e., the process of assigning different branches or subtrees to different a priori specified categories or regimes. The states that we choose to assign to these paintings has nothing at all to do with the colors used to visualize the regimes in (for instance) plotSimmap or phylomorphospace, which (if unassigned by the user) are drawn in sequence from palette().

This is what I mean:

> tree<-pbtree(n=30)
> plotTree(tree,node.numbers=TRUE)

Now let's say we want to paint the subtrees arising from nodes 57, 50, 33, and 42 with different regimes. It doesn't matter why we want to do this - maybe we just want to easily visualize the trees with different subtrees in different colors; nor does it matter that the subtree arising out of node 42 is nested within the subtree arising out of node 33. We can do:

> tree<-paintSubTree(tree,node=57,state="1",anc="0")
> tree<-paintSubTree(tree,node=50,state="2")
> tree<-paintSubTree(tree,node=33,state="3")
> tree<-paintSubTree(tree,node=42,state="4")
Now let's plot the tree with default colors:
> plotSimmap(tree,pts=FALSE)
no colors provided. using the following legend:
      0        1        2        3        4
"black"    "red" "green3"  "blue"  "cyan"

If we want to use different colors, we can just do:

> cols<-c("black","red","blue","green","purple")
> names(cols)<-0:4
> plotSimmap(tree,cols,pts=FALSE)

If our sole purpose in painting on regimes was to visualize the tree in this way, we could have used the desired colors as regimes. I.e.,

> tree<-paintSubTree(tree,node=31,state="black")
> tree<-paintSubTree(tree,node=57,state="red")
> tree<-paintSubTree(tree,node=50,state="blue")
> tree<-paintSubTree(tree,node=33,state="green")
> tree<-paintSubTree(tree,node=42,state="purple")
> cols<-colnames(tree$mapped.edge)
> names(cols)<-cols
> plotSimmap(tree,cols,pts=FALSE)
to exactly the same effect.

Visualizing different regimes in a phylomorphospace works according to the same principle. Here's a quick demo of that using the same tree:

> X<-fastBM(tree,nsim=2)
> phylomorphospace(tree,X,colors=cols,node.by.map=TRUE, xlab="x",ylab="y")

That's really all there is to it.

Thursday, May 9, 2013

Plotting densityMap using grayscale

Travis Ingram commented that it would be nice to be able to plot density maps using the function densityMap in grayscale. He gave a line of code that could be modified internally to do this; however (as of today!) this is not necessary as we can instead modify our object of class "densityMap" (now returned invisibly by the function) and replot using plot.densityMap. This is what that would look like:

> require(phytools)
Loading required package: phytools
> packageVersion("phytools")
[1] ‘0.2.60’
> # this is just to simulate some data
> tree<-pbtree(n=80,scale=1)
> Q<-matrix(c(-1,1,1,-1),2,2)
> rownames(Q)<-colnames(Q)<-c(0,1)
> tree<-sim.history(tree,Q)
> x<-tree$states
> mtrees<-make.simmap(tree,x,nsim=100)
make.simmap is sampling character histories conditioned on the transition matrix
Q =
        0        1
0 -1.25418  1.25418
1  1.25418 -1.25418
(estimated using likelihood);
and (mean) root node prior probabilities
pi =
  0  1
0.5 0.5
Done.
> # we don't care about this plot
> maps<-densityMap(mtrees,res=500)
sorry - this might take a while; please be patient
> # now let's change our colormap using Travis's code
> maps$cols[]<-grey(seq(1,0,length.out=length(maps$cols)))
> plot(maps,fsize=c(0.6,1),outline=TRUE,lwd=5)

(Click for higher res version.) Cool.

Thanks to Travis for the great suggestion & code.

New version of densityMap

The full title of this post should read New much faster version of densityMap that returns the plotted map invisibly combined with generic plotting method for special mapping object class, or something like that - but that seemed like a tongue twister. densityMap is a function to visualize the posterior sample of maps from a stochastic mapping analysis and is described in an article that I have in press at Methods in Ecology & Evolution.

I just posted a new version of densityMap (code here) that simultaneously addresses a couple of significant issues with this (otherwise pretty cool, in my opinion) function.

Firstly, prior versions of the function are way too slow. The function is still slow - just no longer way too slow. This speed-up was achieved entirely using the trick of removing the class attribute of our "multiPhylo" object which - for some reason that is not entirely clear to me - dramatically speeds up handling of the object both by apply family functions, and even in for loops.

Secondly, what makes densityMap especially annoying to work with - given that it is slow - is that to alter any of the plotting options, you need to re-compute the aggregate mappings. In the new version of densityMap (and new minor phytools build, phytools 0.2-60), densityMap plots the mapped tree as before, but also returns a special object of class "densityMap" invisibly. This can then be plotted using a call of the generic plot (to which I have added the phytools method plot.densityMap).

Here's a demo:

> require(phytools)
Loading required package: phytools
> packageVersion("phytools")
[1] ‘0.2.60’
> # simulate tree & data
> tree<-pbtree(n=70,scale=1)
> Q<-matrix(c(-1,1,1,-1),2,2)
> rownames(Q)<-colnames(Q)<-c(0,1)
> tree<-sim.history(tree,Q)
> x<-tree$states
> # generate stochastic maps
> mtrees<-make.simmap(tree,x,nsim=100)
make.simmap is sampling character histories conditioned on the transition matrix
Q =
          0          1
0 -0.8857354  0.8857354
1  0.8857354 -0.8857354
(estimated using likelihood);
and (mean) root node prior probabilities
pi =
  0  1
0.5 0.5
Done.
> # generate density map
> # this would have been much slower before
> system.time(map<-densityMap(mtrees))
sorry - this might take a while; please be patient
  user  system elapsed
  7.85    0.12    7.97

Now, in earlier versions of densityMap if we wanted to adjust the way this plot looked (say - so that the labels don't overlap!) - we would have to recompute the aggregate mapping (which, remember, took us 8s here - but much longer in earlier versions or for bigger trees). However, here we have created the object maps, which we can then pass to plot.densityMap with our revised plotting options:

> plot(map,lwd=5,fsize=c(0.7,1),legend=0.4)
for example.

Cool.

Wednesday, May 8, 2013

Bug fix & update to phylomorphospace

Yesterday a phytools user identified a bug in the node-coloring of phylomorphospace. I thought that this was likely introduced when I recently did a major rewrite of the function (described here) and this seems to be correct. I have fixed this, and also added a new feature to the function that (for trees with a mapped discrete character) will automatically color nodes using the color of the mapped discrete character. The code of the updated function is here; and I have also posted a new phytools build (phytools 0.2-59).

Here's a demo of the fixed node coloring:

> require(phytools)
Loading required package: phytools
> packageVersion("phytools")
[1] ‘0.2.59’
> # first let's simulate a tree & data
> tree<-pbtree(n=30,scale=1)
> XX<-fastBM(tree,nsim=2)
> plotTree(tree,node.numbers=T)
> # now let's say we want to plot nodes
> # descended from "49" red:
> cols<-rep("black",length(tree$tip.label)+tree$Nnode)
> names(cols)<-1:length(cols)
> cols[getDescendants(tree,49)]<-"red"
> # and everything from "40" blue:
> cols[getDescendants(tree,40)]<-"blue"
> # finally, these can even be nested
> cols[getDescendants(tree,44)]<-"yellow"
> # and plot
> phylomorphospace(tree,XX,xlab="X1",ylab="X2", control=list(col.node=cols))
Cool. (This is basically the demo that I gave in an earlier post.)

Now, when I was doing this I realized that it might be cool to be able to color the nodes - as well as the edges - according to a mapped discrete character. To do this, we need to be able to compute the colors of all internal nodes on the tree from their states. Here is the code that I used to do that:

zz<-c(getStates(tree,type ="tips"), getStates(tree))
names(zz)[1:length(tree$tip.label)]<-
  sapply(names(zz)[1:length(tree$tip.label)],
  function(x,y) which(y==x),y=tree$tip.label)
con$col.node<-setNames(colors[zz],names(zz))
The first line just computes the states at all tip & internal nodes using getStates; the second is just a complicated way of translating tip labels in names(zz) into node numbers, which is what is need by phylomorphospace; finally, line three translates the node states to colors.

Here is a demo:

> # transition matrix
> Q<-matrix(c(-2,2,2,-2),2,2)
> colnames(Q)<-rownames(Q)<-letters[1:2]
> # simulate stochastic history
> tree<-sim.history(tree,Q)
> phylomorphospace(tree,XX,xlab="X1",ylab="X2", colors=setNames(c("blue","red"),letters[1:2]), node.by.map=TRUE)

That's it.

Much faster versions of countSimmap, describe.simmap, and getStates

Yesterday, Klaus Schliep pointed out that a trick to speed up lapply's handling of objects of class "multiPhylo" (i.e., just a simple list of objects of class "phylo") was just to first remove the class attribute "multiPhylo". Why this would have any effect at all is somewhat of a mystery to me. is.list(trees) evaluates TRUE regardless of whether or not the class attribute has been removed, so it doesn't seem that lapply would have to coerce our object to a list in either case. Nonetheless - not only does this work, it works tremendously! I have now included this simple trick in countSimmap, describe.simmap, and getStates, all of which use lapply or sapply if the argument tree is an object of class "multiPhylo".

Here's a demo of just how much of an improvement in speed results from this trick:

> require(phytools)
Loading required package: phytools
> packageVersion("phytools")
[1] ‘0.2.56’
>
> # simulate data
> tree<-pbtree(n=100,scale=1)
> Q<-matrix(c(-1,1,1,-1),2,2)
> colnames(Q)<-rownames(Q)<-letters[1:2]
> tree<-sim.history(tree,Q)
> x<-tree$states
>
> # stochastic mapping
> mtrees<-make.simmap(tree,x,nsim=1000)
make.simmap is sampling character histories conditioned on the transition matrix
Q =
          a          b
a -0.8686634  0.8686634
b  0.8686634 -0.8686634
(estimated using likelihood);
and (mean) root node prior probabilities
pi =
  a  b
0.5 0.5
Done.
>
> # ok, now let's time describe.simmap for
> # various subsets of our mapped trees
> system.time(X100<-describe.simmap(mtrees[1:100], message=FALSE))
  user  system elapsed
  2.69    0.02    2.70
> system.time(X200<-describe.simmap(mtrees[1:200], message=FALSE))
  user  system elapsed
  12.71    0.02  12.75
> system.time(X400<-describe.simmap(mtrees[1:400], message=FALSE))
  user  system elapsed
  74.24    0.64  75.57

Woah. I'm not even going to try the full set of 1,000 trees.

OK, now let's compare to describe.simmap with nothing more than the trick suggested by Klaus (i.e., unclass-ing the "multiPhylo" object for every use of lapply or sapply):

> require(phytools)
Loading required package: phytools
> packageVersion("phytools")
[1] ‘0.2.58’
> system.time(Y100<-describe.simmap(mtrees[1:100], message=FALSE))
  user  system elapsed
  0.45    0.00    0.45
> system.time(Y200<-describe.simmap(mtrees[1:200], message=FALSE))
  user  system elapsed
  0.83    0.00    0.82
> system.time(Y400<-describe.simmap(mtrees[1:400], message=FALSE))
  user  system elapsed
  1.78    0.00    1.78
Holy cow! What a huge improvement. We can even now run it on the full set of 1,000 mapped trees:
> par(cex=0.8) # make our tip labels a little smaller
> system.time(YY<-describe.simmap(mtrees,plot=TRUE))
1000 trees with a mapped discrete character with states:
a, b

trees have 29.633 changes between states on average

changes are of the following types:
        a,b    b,a
x->y 10.692 18.941

mean total time spent in each state is:
              a          b    total
raw  12.9314025 18.3673136 31.29872
prop  0.4131608  0.5868392  1.00000

  user  system elapsed
  4.65    0.06    4.71

Wow.