Wednesday, January 25, 2017

plotTree.barplot with more user options, including stacked bars for multiple input traits

In response to a user comment I have added a feature to plotTree.barplot to permit multiple traits or values to be plotted for each species - either using stacked bars, or by plotting the bars side-by-side.

The update can be seen here. The problem turned out to be a little bit mroe complicated then it seemed it first, mainly because of a quirk in barplot that means that the object returned by boxplot (which I then use to correctly space the tips of the plotted tree) differs depending on whether the input data come in the form of a vector of single values or a matrix, &, if the latter, whether the argument beside=TRUE is selected or not.

Here's a quick & dirty demo. These data are (simulated to represent) relative frequencies - said dietary fraction of three different food types - so stacking of our bars is appropriate:

library(phytools)
packageVersion("phytools")
## [1] '0.5.70'
tree
## 
## Phylogenetic tree with 26 tips and 25 internal nodes.
## 
## Tip labels:
##  A, B, C, D, E, F, ...
## 
## Rooted; includes branch lengths.
X
##        plant  vertebrate invertebrate
## A 0.31086049 0.212175943   0.47696356
## B 0.15404215 0.440976975   0.40498087
## C 0.26982034 0.450834511   0.27934515
## D 0.12260206 0.358475976   0.51892197
## E 0.29979068 0.129743168   0.57046615
## F 0.20163877 0.440001107   0.35836012
## G 0.05465514 0.410528162   0.53481670
## H 0.47392116 0.121520364   0.40455848
## I 0.44344594 0.101251541   0.45530252
## J 0.24839375 0.268691107   0.48291514
## K 0.12309160 0.101699889   0.77520851
## L 0.24010676 0.065417274   0.69447597
## M 0.23424489 0.226431850   0.53932326
## N 0.40030662 0.057542453   0.54215092
## O 0.15332681 0.023255378   0.82341781
## P 0.44389924 0.491496743   0.06460401
## Q 0.27010640 0.292077796   0.43781581
## R 0.21643656 0.208244073   0.57531937
## S 0.19650159 0.317376110   0.48612230
## T 0.01261411 0.427646672   0.55973922
## U 0.32707264 0.164682573   0.50824478
## V 0.28931830 0.126987763   0.58369394
## W 0.08407750 0.252999348   0.66292315
## X 0.28631262 0.004105909   0.70958147
## Y 0.38118921 0.206806916   0.41200387
## Z 0.07941718 0.074216139   0.84636668
plotTree.barplot(tree,X)

plot of chunk unnamed-chunk-1

Or, we can plot the bars side-by-side:

plotTree.barplot(tree,X,args.barplot=list(beside=TRUE,xlim=c(0,1),
    legend.text=TRUE,space=c(0,1.2),args.legend=list(x=1,y=17)))

plot of chunk unnamed-chunk-2

Note that beside=FALSE is not a good option if our data have both positive & negative values as this can produce a weird result. For instance:

Y
##         [,1]        [,2]        [,3]
## A  0.7618018  2.59661203 -0.23590050
## B  0.8548063  1.40920654  0.73019040
## C -1.1970456  1.09259205  0.56974523
## D -0.4355176  2.31904920 -1.18347277
## E  2.2923226  2.35986558 -3.22201293
## F  2.5004466  3.98754763 -2.68168275
## G  2.7103706  1.94742983 -1.81626643
## H  1.4677625  1.90001387 -2.78720751
## I  1.5596683  4.57540030 -0.17596327
## J  1.5259543  5.22197986 -0.50073032
## K  1.4834068  4.99761663 -0.35070911
## L  1.3819468  4.84211948 -0.49581275
## M  1.4874512  4.99309837 -0.06583491
## N  0.3560632  3.50340705 -0.95027035
## O -0.2002478  3.80948866 -0.60808238
## P -0.9702761  3.93130784 -3.55745209
## Q -0.2638056  2.81862255 -2.47669725
## R  0.6922559  3.11378711 -3.53855007
## S -0.5630365  3.80538588 -2.87224419
## T  0.7185199  3.63159681 -0.96902150
## U  0.2464106  3.45631197 -0.05955653
## V -0.9173568  5.08132488 -3.08922774
## W -0.5006388  4.79906054 -3.26551534
## X  0.6836688  4.47601771 -3.28019095
## Y -0.5106440 -1.65927057 -0.89645983
## Z  0.3405579 -0.02014156 -0.76899774
plotTree.barplot(tree,Y)

plot of chunk unnamed-chunk-3

If one is determined to use a barplot for these kind of data, I would recommend beside=TRUE which produces a more sensible result:

plotTree.barplot(tree,Y,args.barplot=list(beside=TRUE,space=c(0,1.2)))

plot of chunk unnamed-chunk-4

That's it.

Tuesday, January 10, 2017

Overlaying a contMap style continuous character map on a dotTree plot

A phytools user recently posted the following question:

“Is there a blog post / function capable of combining the functionality of contmap and dottree? I'm looking to plot a continuous character with a color ramp on the tree with a discrete character dot next to the tips of the tree. If I there is a post on this I have missed it in my search and would appreciate being pointed in the right direction if it is out there.”

The answer is that this is pretty straightforward to do - though it does take a little investigating of the internals of each function.

Here my approach will be to compute a "contMap" object & then overlay it on top of a plotted dotTree.

We can start by imagining the following data:

library(phytools)
tree
## 
## Phylogenetic tree with 26 tips and 25 internal nodes.
## 
## Tip labels:
##  A, B, C, D, E, F, ...
## 
## Rooted; includes branch lengths.
x
## A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 
## c c c b a c c c c c c b a a b b b b b c c c b a a a 
## Levels: a b c
y
##       A       B       C       D       E       F       G       H       I 
## -0.7592 -0.9114 -1.1249  0.0006 -0.8630  2.5855  2.8233  2.4789  2.3854 
##       J       K       L       M       N       O       P       Q       R 
## -2.5600 -2.3400 -2.3566 -1.8551 -1.6360 -0.4659 -3.1787 -2.8704 -0.0257 
##       S       T       U       V       W       X       Y       Z 
##  0.3514  0.0436  0.9419 -0.2918 -0.6332 -0.5828 -0.3969 -2.1122

Next, we compute our "contMap" object, but we don't plot it:

obj<-contMap(tree,y,plot=FALSE)
obj
## Object of class "contMap" containing:
## 
## (1) A phylogenetic tree with 26 tips and 25 internal nodes.
## 
## (2) A mapped continuous trait on the range (-3.1787, 2.8233).

Finally, we plot our discrete character using dotTree after which we overlay our contMap style plot. This is the tricky part because I had to look up & duplicate the internal plotting parameters of dotTree to ensure that the two trees where directly overlain:

cols<-setNames(c("black","red","blue"),c("a","b","c"))
dotTree(tree,x,colors=cols,lwd=7)
par(fg="transparent")
plot(obj$tree,add=TRUE,lwd=5,colors=obj$cols,
    ylim=c(-1/25*Ntip(tree),Ntip(tree)),offset=1.7)
par(fg="black")
add.color.bar(0.3*max(nodeHeights(tree)),obj$cols,title="trait value",
    lims=obj$lims,digits=3,prompt=FALSE,x=0.3*max(nodeHeights(tree)),
    y=0.4*(1+par()$usr[3]),lwd=4,fsize=1,subtitle="")

plot of chunk unnamed-chunk-3

That's it!

Plotting terminal edges of the tree different colors depending (or not) on a discrete character using phytools

I recently received the following request:

“I would love to know if there is any way of extracting and colouring only a certain set of the terminal edges of the phylogeny, based on the tip labels. For example, using edge.color within plot.phylo based on a vector of tips. Do you know of any code that could be used to do this.”

This is pretty straightforward to do using phytools. Note that it can also be done using plot.phylo in the ape package, but I'm going to focus on the phytools way for obvious reasons.

I'll image that I'm painting the terminal edges leading to each tip with different colors based on the state of a discrete trait - but we could modify our technique for any arbitrary grouping of tips.

Here's the tree & data:

library(phytools)
tree
## 
## Phylogenetic tree with 26 tips and 25 internal nodes.
## 
## Tip labels:
##  A, B, C, D, E, F, ...
## 
## Rooted; includes branch lengths.
x
## A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 
## c a c a a c c a a a b c c a c b a a b c c b c a a b 
## Levels: a b c

Now, let's say we want to color the edges blue & red respectively if the corresponding edge is in state b or c, respectively, and leave it black if the tip is in state a.

## identify the tips in states b or c
b<-names(x)[x=="b"]
b
## [1] "K" "P" "S" "V" "Z"
c<-names(x)[x=="c"]
c
##  [1] "A" "C" "F" "G" "L" "M" "O" "T" "U" "W"
## paint the edges
tt<-paintBranches(tree,edge=sapply(b,match,tree$tip.label),
    state="b",anc.state="a")
tt<-paintBranches(tt,edge=sapply(c,match,tree$tip.label),
    state="c")

Now we can plot our tree:

tt
## 
## Phylogenetic tree with 26 tips and 25 internal nodes.
## 
## Tip labels:
##  A, B, C, D, E, F, ...
## 
## The tree includes a mapped, 3-state discrete character with states:
##  a, b, c
## 
## Rooted; includes branch lengths.
cols<-setNames(c("black","blue","red"),c("a","b","c"))
plot(tt,colors=cols,lwd=4,split.vertical=TRUE,ftype="i")

plot of chunk unnamed-chunk-3

Let's check this against a plot generated using dotTree:

dotTree(tree,x,colors=cols)

plot of chunk unnamed-chunk-4

That's about it.

Tree & data for this demo were simulated as follows:

tree<-pbtree(n=26,tip.label=LETTERS)
Q<-matrix(c(-2,1,1,1,-2,1,1,1,-2),3,3)
rownames(Q)<-colnames(Q)<-letters[1:3]
x<-as.factor(sim.history(tree,Q)$states)

Monday, January 9, 2017

Extracting reconstructed trait values along edges at different time-slices of a phylogenetic tree

A phytools user contacted me the other day about getting reconstructed internal values instead of at nodes, at different time slices of the tree.

His idea was to extract these from a "contMap" object, but since contMap is primarily for visualization (and finally discretizes time), a more sensible approach is to simply re-root the tree at each of the desired time slices and compute the values for those slices.

Note that now our values will correspond to points along edges, rather than to number internal nodes.

Here, I'll demo this using a tree with 26 taxa & total depth of 100, sampling edge states at depths 25, 50, & 75 - but roughly the same procedure should work on any tree.

First, here are our data:

library(phytools)
tree
## 
## Phylogenetic tree with 26 tips and 25 internal nodes.
## 
## Tip labels:
##  A, B, C, D, E, F, ...
## 
## Rooted; includes branch lengths.
x
##           A           B           C           D           E           F 
##  -5.8903745  -5.7282584   0.5445433 -13.8386694 -14.0666611  -5.6192333 
##           G           H           I           J           K           L 
##   0.2928983  -8.9493757  -9.2341163 -13.2477565 -18.3027044 -20.3522182 
##           M           N           O           P           Q           R 
## -22.4985786 -25.7573926  -6.8546698  -9.4871527 -17.8719246  13.8470125 
##           S           T           U           V           W           X 
##  17.0261569  11.0099036   8.9560054  19.7438105  10.0868538   8.2936089 
##           Y           Z 
##   3.1156954   3.5875638
plotTree(tree,mar=c(4.1,1.1,1.1,1.1))
axis(1)
abline(v=c(25,50,75),lty="dashed")
nodelabels()

plot of chunk unnamed-chunk-1

Note that for each time slice, obviously we are going to get different numbers of states (3, 9, & 13 in this case).

Now, let's do it for time slice t = 25:

H<-nodeHeights(tree)
t<-25
ii<-intersect(which(H[,1]<=t),which(H[,2]>t))
node<-tree$edge[ii,2]
position<-t-H[ii,1]
rerooted<-mapply(reroot,node=node,position=position,
    MoreArgs=list(tree=tree),SIMPLIFY=FALSE)
foo<-function(ind,tree) paste(tree$edge[ind,],collapse=",")
a25<-setNames(sapply(rerooted,function(t,x) fastAnc(t,x)[1],x=x),
    sapply(ii,foo,tree=tree))
a25
##     28,29     28,34     27,44 
## -4.797286 -7.056366  4.821428

As you can see, I've set the names of a25 to match the starting & ending node indices as plotted above.

We can of course repeat the same procedure for each of our time slices:

## time 50
t<-50
ii<-intersect(which(H[,1]<=t),which(H[,2]>t))
node<-tree$edge[ii,2]
position<-t-H[ii,1]
rerooted<-mapply(reroot,node=node,position=position,
    MoreArgs=list(tree=tree),SIMPLIFY=FALSE)
foo<-function(ind,tree) paste(tree$edge[ind,],collapse=",")
a50<-setNames(sapply(rerooted,function(t,x) fastAnc(t,x)[1],x=x),
    sapply(ii,foo,tree=tree))
a50
##      30,31      30,32       29,6      34,35      38,39      38,42 
##  -5.685643  -5.788953  -5.558655  -9.855100 -12.692816 -11.479507 
##      44,45      46,47      46,51 
##   8.854554   7.781511   7.332690
## time 75
t<-75
ii<-intersect(which(H[,1]<=t),which(H[,2]>t))
node<-tree$edge[ii,2]
position<-t-H[ii,1]
rerooted<-mapply(reroot,node=node,position=position,
    MoreArgs=list(tree=tree),SIMPLIFY=FALSE)
foo<-function(ind,tree) paste(tree$edge[ind,],collapse=",")
a75<-setNames(sapply(rerooted,function(t,x) fastAnc(t,x)[1],x=x),
    sapply(ii,foo,tree=tree))
a75
##      30,31      30,32       29,6       37,7       37,8       36,9 
##  -5.748358  -6.702843  -5.588944  -6.015129  -8.228368  -8.797681 
##      35,10      38,39      42,43      42,17      44,45      47,48 
## -10.128888 -18.896693 -11.157098 -13.541521  12.638761   9.604015 
##      47,49      46,51 
##  12.091764   5.072981

If we overlay these points on a 'traitgram' style visualization, we should see that they fall directly on the lines of our plot.

phenogram(tree,x,spread.cost=c(1,0))
abline(v=c(25,50,75),lty="dashed")
points(rep(25,length(a25)),a25,pch=21,bg="grey",cex=1.2)
points(rep(50,length(a50)),a50,pch=21,bg="grey",cex=1.2)
points(rep(75,length(a75)),a75,pch=21,bg="grey",cex=1.2)

plot of chunk unnamed-chunk-4

You get the idea. Cool!