Tuesday, January 10, 2017

Overlaying a contMap style continuous character map on a dotTree plot

A phytools user recently posted the following question:

“Is there a blog post / function capable of combining the functionality of contmap and dottree? I'm looking to plot a continuous character with a color ramp on the tree with a discrete character dot next to the tips of the tree. If I there is a post on this I have missed it in my search and would appreciate being pointed in the right direction if it is out there.”

The answer is that this is pretty straightforward to do - though it does take a little investigating of the internals of each function.

Here my approach will be to compute a "contMap" object & then overlay it on top of a plotted dotTree.

We can start by imagining the following data:

library(phytools)
tree
## 
## Phylogenetic tree with 26 tips and 25 internal nodes.
## 
## Tip labels:
##  A, B, C, D, E, F, ...
## 
## Rooted; includes branch lengths.
x
## A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 
## c c c b a c c c c c c b a a b b b b b c c c b a a a 
## Levels: a b c
y
##       A       B       C       D       E       F       G       H       I 
## -0.7592 -0.9114 -1.1249  0.0006 -0.8630  2.5855  2.8233  2.4789  2.3854 
##       J       K       L       M       N       O       P       Q       R 
## -2.5600 -2.3400 -2.3566 -1.8551 -1.6360 -0.4659 -3.1787 -2.8704 -0.0257 
##       S       T       U       V       W       X       Y       Z 
##  0.3514  0.0436  0.9419 -0.2918 -0.6332 -0.5828 -0.3969 -2.1122

Next, we compute our "contMap" object, but we don't plot it:

obj<-contMap(tree,y,plot=FALSE)
obj
## Object of class "contMap" containing:
## 
## (1) A phylogenetic tree with 26 tips and 25 internal nodes.
## 
## (2) A mapped continuous trait on the range (-3.1787, 2.8233).

Finally, we plot our discrete character using dotTree after which we overlay our contMap style plot. This is the tricky part because I had to look up & duplicate the internal plotting parameters of dotTree to ensure that the two trees where directly overlain:

cols<-setNames(c("black","red","blue"),c("a","b","c"))
dotTree(tree,x,colors=cols,lwd=7)
par(fg="transparent")
plot(obj$tree,add=TRUE,lwd=5,colors=obj$cols,
    ylim=c(-1/25*Ntip(tree),Ntip(tree)),offset=1.7)
par(fg="black")
add.color.bar(0.3*max(nodeHeights(tree)),obj$cols,title="trait value",
    lims=obj$lims,digits=3,prompt=FALSE,x=0.3*max(nodeHeights(tree)),
    y=0.4*(1+par()$usr[3]),lwd=4,fsize=1,subtitle="")

plot of chunk unnamed-chunk-3

That's it!

Plotting terminal edges of the tree different colors depending (or not) on a discrete character using phytools

I recently received the following request:

“I would love to know if there is any way of extracting and colouring only a certain set of the terminal edges of the phylogeny, based on the tip labels. For example, using edge.color within plot.phylo based on a vector of tips. Do you know of any code that could be used to do this.”

This is pretty straightforward to do using phytools. Note that it can also be done using plot.phylo in the ape package, but I'm going to focus on the phytools way for obvious reasons.

I'll image that I'm painting the terminal edges leading to each tip with different colors based on the state of a discrete trait - but we could modify our technique for any arbitrary grouping of tips.

Here's the tree & data:

library(phytools)
tree
## 
## Phylogenetic tree with 26 tips and 25 internal nodes.
## 
## Tip labels:
##  A, B, C, D, E, F, ...
## 
## Rooted; includes branch lengths.
x
## A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 
## c a c a a c c a a a b c c a c b a a b c c b c a a b 
## Levels: a b c

Now, let's say we want to color the edges blue & red respectively if the corresponding edge is in state b or c, respectively, and leave it black if the tip is in state a.

## identify the tips in states b or c
b<-names(x)[x=="b"]
b
## [1] "K" "P" "S" "V" "Z"
c<-names(x)[x=="c"]
c
##  [1] "A" "C" "F" "G" "L" "M" "O" "T" "U" "W"
## paint the edges
tt<-paintBranches(tree,edge=sapply(b,match,tree$tip.label),
    state="b",anc.state="a")
tt<-paintBranches(tt,edge=sapply(c,match,tree$tip.label),
    state="c")

Now we can plot our tree:

tt
## 
## Phylogenetic tree with 26 tips and 25 internal nodes.
## 
## Tip labels:
##  A, B, C, D, E, F, ...
## 
## The tree includes a mapped, 3-state discrete character with states:
##  a, b, c
## 
## Rooted; includes branch lengths.
cols<-setNames(c("black","blue","red"),c("a","b","c"))
plot(tt,colors=cols,lwd=4,split.vertical=TRUE,ftype="i")

plot of chunk unnamed-chunk-3

Let's check this against a plot generated using dotTree:

dotTree(tree,x,colors=cols)

plot of chunk unnamed-chunk-4

That's about it.

Tree & data for this demo were simulated as follows:

tree<-pbtree(n=26,tip.label=LETTERS)
Q<-matrix(c(-2,1,1,1,-2,1,1,1,-2),3,3)
rownames(Q)<-colnames(Q)<-letters[1:3]
x<-as.factor(sim.history(tree,Q)$states)

Monday, January 9, 2017

Extracting reconstructed trait values along edges at different time-slices of a phylogenetic tree

A phytools user contacted me the other day about getting reconstructed internal values instead of at nodes, at different time slices of the tree.

His idea was to extract these from a "contMap" object, but since contMap is primarily for visualization (and finally discretizes time), a more sensible approach is to simply re-root the tree at each of the desired time slices and compute the values for those slices.

Note that now our values will correspond to points along edges, rather than to number internal nodes.

Here, I'll demo this using a tree with 26 taxa & total depth of 100, sampling edge states at depths 25, 50, & 75 - but roughly the same procedure should work on any tree.

First, here are our data:

library(phytools)
tree
## 
## Phylogenetic tree with 26 tips and 25 internal nodes.
## 
## Tip labels:
##  A, B, C, D, E, F, ...
## 
## Rooted; includes branch lengths.
x
##           A           B           C           D           E           F 
##  -5.8903745  -5.7282584   0.5445433 -13.8386694 -14.0666611  -5.6192333 
##           G           H           I           J           K           L 
##   0.2928983  -8.9493757  -9.2341163 -13.2477565 -18.3027044 -20.3522182 
##           M           N           O           P           Q           R 
## -22.4985786 -25.7573926  -6.8546698  -9.4871527 -17.8719246  13.8470125 
##           S           T           U           V           W           X 
##  17.0261569  11.0099036   8.9560054  19.7438105  10.0868538   8.2936089 
##           Y           Z 
##   3.1156954   3.5875638
plotTree(tree,mar=c(4.1,1.1,1.1,1.1))
axis(1)
abline(v=c(25,50,75),lty="dashed")
nodelabels()

plot of chunk unnamed-chunk-1

Note that for each time slice, obviously we are going to get different numbers of states (3, 9, & 13 in this case).

Now, let's do it for time slice t = 25:

H<-nodeHeights(tree)
t<-25
ii<-intersect(which(H[,1]<=t),which(H[,2]>t))
node<-tree$edge[ii,2]
position<-t-H[ii,1]
rerooted<-mapply(reroot,node=node,position=position,
    MoreArgs=list(tree=tree),SIMPLIFY=FALSE)
foo<-function(ind,tree) paste(tree$edge[ind,],collapse=",")
a25<-setNames(sapply(rerooted,function(t,x) fastAnc(t,x)[1],x=x),
    sapply(ii,foo,tree=tree))
a25
##     28,29     28,34     27,44 
## -4.797286 -7.056366  4.821428

As you can see, I've set the names of a25 to match the starting & ending node indices as plotted above.

We can of course repeat the same procedure for each of our time slices:

## time 50
t<-50
ii<-intersect(which(H[,1]<=t),which(H[,2]>t))
node<-tree$edge[ii,2]
position<-t-H[ii,1]
rerooted<-mapply(reroot,node=node,position=position,
    MoreArgs=list(tree=tree),SIMPLIFY=FALSE)
foo<-function(ind,tree) paste(tree$edge[ind,],collapse=",")
a50<-setNames(sapply(rerooted,function(t,x) fastAnc(t,x)[1],x=x),
    sapply(ii,foo,tree=tree))
a50
##      30,31      30,32       29,6      34,35      38,39      38,42 
##  -5.685643  -5.788953  -5.558655  -9.855100 -12.692816 -11.479507 
##      44,45      46,47      46,51 
##   8.854554   7.781511   7.332690
## time 75
t<-75
ii<-intersect(which(H[,1]<=t),which(H[,2]>t))
node<-tree$edge[ii,2]
position<-t-H[ii,1]
rerooted<-mapply(reroot,node=node,position=position,
    MoreArgs=list(tree=tree),SIMPLIFY=FALSE)
foo<-function(ind,tree) paste(tree$edge[ind,],collapse=",")
a75<-setNames(sapply(rerooted,function(t,x) fastAnc(t,x)[1],x=x),
    sapply(ii,foo,tree=tree))
a75
##      30,31      30,32       29,6       37,7       37,8       36,9 
##  -5.748358  -6.702843  -5.588944  -6.015129  -8.228368  -8.797681 
##      35,10      38,39      42,43      42,17      44,45      47,48 
## -10.128888 -18.896693 -11.157098 -13.541521  12.638761   9.604015 
##      47,49      46,51 
##  12.091764   5.072981

If we overlay these points on a 'traitgram' style visualization, we should see that they fall directly on the lines of our plot.

phenogram(tree,x,spread.cost=c(1,0))
abline(v=c(25,50,75),lty="dashed")
points(rep(25,length(a25)),a25,pch=21,bg="grey",cex=1.2)
points(rep(50,length(a50)),a50,pch=21,bg="grey",cex=1.2)
points(rep(75,length(a75)),a75,pch=21,bg="grey",cex=1.2)

plot of chunk unnamed-chunk-4

You get the idea. Cool!

Thursday, December 29, 2016

Methods & object class for phyl.RMA (phylogenetic RMA regression)

I just added S3 print, residuals, and coef methods for the phylogenetic reduced major axis regression function, phyl.RMA. The updates can be seen here.

The following is a quick demo using simulated data:

library(phytools)
packageVersion("phytools")
## [1] '0.5.68'
tree
## 
## Phylogenetic tree with 26 tips and 25 internal nodes.
## 
## Tip labels:
##  A, B, C, D, E, F, ...
## 
## Rooted; includes branch lengths.
x
##           A           B           C           D           E           F 
##  5.02625173  3.84069323  1.40907359  1.62584177 -0.38820852 -0.03660051 
##           G           H           I           J           K           L 
## -1.15405529 -1.05555766 -1.10500369 -1.82982215 -0.58220778 -0.62621299 
##           M           N           O           P           Q           R 
##  2.58984857  1.65864863  1.38732527  1.10492094  0.89527230 -1.40154533 
##           S           T           U           V           W           X 
##  1.23500803 -0.24029572 -0.26507837 -0.77611134 -0.88118355 -2.59262521 
##           Y           Z 
## -0.11687224 -1.68772212
y
##           A           B           C           D           E           F 
##  5.03364484  4.45525372  2.33034028  1.70260435  1.86376515  0.63005315 
##           G           H           I           J           K           L 
## -1.27847189 -0.51986100  0.42627759 -0.59603022 -1.02732412 -1.06377260 
##           M           N           O           P           Q           R 
##  0.11285994  0.08066473 -0.02928941 -0.14296226 -0.38034287 -2.11329071 
##           S           T           U           V           W           X 
## -0.12823996  1.88409506  2.23634682 -0.82779628 -0.49460318 -2.37103424 
##           Y           Z 
## -2.07654579 -0.58623675
phylomorphospace(tree,cbind(x,y),node.size=c(0,0))
points(cbind(x,y),cex=1.2,pch=21,bg="grey")

plot of chunk unnamed-chunk-1

Now let's run the RMA regression:

obj<-phyl.RMA(x,y,tree)
## print
obj
## 
## Coefficients:
## (Intercept)           x 
##  -0.0424625   1.0647447 
## 
## VCV matrix:
##          x        y
## x 1.504905 1.119709
## y 1.119709 1.706083
## 
## Model for the covariance structure of the error is "BM"
## 
## Estimates (or set values):
##    lambda    log(L) 
##   1.00000 -69.49404 
## 
## Hypothesis test based on Clarke (1980; Biometrika):
##        r2         T        df         P 
##  0.488316  0.429650 21.290154  0.671767 
## 
## Note that the null hypothesis test is h0 = 1
coef(obj)
## (Intercept)           x 
##  -0.0424625   1.0647447
residuals(obj)
##            A            B            C            D            E 
## -0.275567308  0.408358649  0.872499213  0.013960519  2.319570602 
##            F            G            H            I            J 
##  0.711485847 -0.007235186  0.646500875  1.645286860  1.394725642 
##            K            L            M            N            O 
## -0.364958989 -0.354553159 -2.602204971 -1.642910026 -1.463974066 
##            P            Q            R            S            T 
## -1.276958414 -1.291116753 -0.578540317 -1.400745657  2.182411146 
##            U            V            W            X            Y 
##  2.561050105  0.041026621  0.486094798  0.431912090 -1.909644200 
##            Z 
##  1.253218854

I also wrote a simple plot method that throws the RMA line on top of a plotted phylomorphospace. Details can be seen here.

This is what it looks like:

plot(obj)

plot of chunk unnamed-chunk-3

We can also flip x or y just to see what it looks like to plot a negative slope. Note that we can only test a null hypothesis that has the same sign as our fitted RMA line. Here (for fun), I'll test the null hypothesis h0 = -2/3.

neg.x<--x
obj<-phyl.RMA(neg.x,y,tree,h0=-2/3)
obj
## 
## Coefficients:
## (Intercept)           x 
##  -0.0424625  -1.0647447 
## 
## VCV matrix:
##           x         y
## x  1.504905 -1.119709
## y -1.119709  1.706083
## 
## Model for the covariance structure of the error is "BM"
## 
## Estimates (or set values):
##    lambda    log(L) 
##   1.00000 -69.49404 
## 
## Hypothesis test based on Clarke (1980; Biometrika):
##        r2         T        df         P 
##  0.488316  3.206537 21.290154  0.004187 
## 
## Note that the null hypothesis test is h0 = -0.666666666666667
plot(obj)

plot of chunk unnamed-chunk-4

That's the idea anyway.

Data for this demo were simulated as follows:

library(phytools)
tree<-pbtree(n=26,tip.label=LETTERS)
xy<-fastBM(tree)
x<-xy+fastBM(tree,sig2=0.4)
y<-xy+fastBM(tree,sig2=0.4)