Tuesday, March 31, 2015

Intensive short course on phylogenetic comparative methods in R

We are pleased to announce a new graduate-level intensive short course on the use of R for phylogenetic comparative analysis. The course will be four days in length and will take place at the Hotel Ilha Flata in Ilhabela, Sao Paulo State, Brazil, from the 2nd to the 5th of July, 2015. This course is funded by the National Science Foundation. The course is free of cost; and accommodation at the course venue, as well as breakfast & lunch on all course days, is included for all accepted students. There will be a small number of travel stipends available for qualified students and post-docs. Applicants are welcome from any country, but are especially encouraged from the Latin American region.

Topics covered will include: an introduction to the R environment and programming language, tree manipulation, independent contrasts and phylogenetic generalized least squares, ancestral state reconstruction, models of character evolution, diversification analysis, and community phylogenetic analysis. Course instructors will include Dr. Liam Revell (University of Massachusetts Boston), Dr. Luke Harmon (University of Idaho), and Dr. Mike Alfaro (University of California, Los Angeles). Instruction in the course will be primarily in English, thus all students must have a basic working knowledge of scientific English.

To apply for the course, please submit your CV along with a short (maximum 1 page) description of your research interests, background, and reasons for taking the course. Admission is competitive, and preference will go towards students with background in phylogenetics and a compelling motivation for taking the course. Applications should be submitted by email to ilhabela.phylogenetics.course@gmail.com by May 1st, 2015. Questions can be directed to liam.revell@umb.edu (or posted in the comments section, below).

Converting a phylogeny with node labels to a taxonomy

An R-sig-phylo query asked:

“I wondered if anyone had coded a method for converting a phylogenetic tree with polytomies and node labels into a taxonomy (in some form of data table).”

So far as I know, this has not been done - but nor is it very hard, at least in the highly hypothetical circumstances in which every taxonomic level (i.e., Order, Family, Genus, etc.) that is desired in our table is labeled using node labels and every path from the root to any tip has the same number of labels.

Here's a quick demo, using a balanced tree - although it could equally well be a polytomous tree, or a stochastic tree, so long as the aforementioned rule holds.

library(phytools)
## first here is our tree
tree<-stree(n=64,type="balanced")
tree$node.label<-rep("",tree$Nnode)
tree$node.label[1]<-"Order_1"
tree$node.label[66-Ntip(tree)]<-"Family_1"
tree$node.label[97-Ntip(tree)]<-"Family_2"
tree$node.label[68-Ntip(tree)]<-"Genus_1"
tree$node.label[75-Ntip(tree)]<-"Genus_2"
tree$node.label[83-Ntip(tree)]<-"Genus_3"
tree$node.label[90-Ntip(tree)]<-"Genus_4"
tree$node.label[99-Ntip(tree)]<-"Genus_5"
tree$node.label[106-Ntip(tree)]<-"Genus_6"
tree$node.label[114-Ntip(tree)]<-"Genus_7"
tree$node.label[121-Ntip(tree)]<-"Genus_8"
plotTree(tree,fsize=0.65,ftype="i",lwd=1,xlim=c(-0.06,1.1))
nodelabels(node=which(tree$node.label!="")+Ntip(tree),
    text=tree$node.label[which(tree$node.label!="")],cex=0.7)

plot of chunk unnamed-chunk-1

Now here is the code to pull out the taxonomy:

getAncestors<-phytools:::getAncestors
foo<-function(tip,tree){
    node<-which(tree$tip.label==tip)
    a<-tree$node.label[getAncestors(tree,node,"all")-Ntip(tree)]
    a<-a[length(a):1]
    c(a[a!=""],tip)
}
T<-t(sapply(tree$tip.label,foo,tree=tree))
T
##     [,1]      [,2]       [,3]      [,4] 
## t1  "Order_1" "Family_1" "Genus_1" "t1" 
## t2  "Order_1" "Family_1" "Genus_1" "t2" 
## t3  "Order_1" "Family_1" "Genus_1" "t3" 
## t4  "Order_1" "Family_1" "Genus_1" "t4" 
## t5  "Order_1" "Family_1" "Genus_1" "t5" 
## t6  "Order_1" "Family_1" "Genus_1" "t6" 
## t7  "Order_1" "Family_1" "Genus_1" "t7" 
## t8  "Order_1" "Family_1" "Genus_1" "t8" 
## t9  "Order_1" "Family_1" "Genus_2" "t9" 
## t10 "Order_1" "Family_1" "Genus_2" "t10"
## t11 "Order_1" "Family_1" "Genus_2" "t11"
## t12 "Order_1" "Family_1" "Genus_2" "t12"
## t13 "Order_1" "Family_1" "Genus_2" "t13"
## t14 "Order_1" "Family_1" "Genus_2" "t14"
## t15 "Order_1" "Family_1" "Genus_2" "t15"
## t16 "Order_1" "Family_1" "Genus_2" "t16"
## t17 "Order_1" "Family_1" "Genus_3" "t17"
## t18 "Order_1" "Family_1" "Genus_3" "t18"
## t19 "Order_1" "Family_1" "Genus_3" "t19"
## t20 "Order_1" "Family_1" "Genus_3" "t20"
## t21 "Order_1" "Family_1" "Genus_3" "t21"
## t22 "Order_1" "Family_1" "Genus_3" "t22"
## t23 "Order_1" "Family_1" "Genus_3" "t23"
## t24 "Order_1" "Family_1" "Genus_3" "t24"
## t25 "Order_1" "Family_1" "Genus_4" "t25"
## t26 "Order_1" "Family_1" "Genus_4" "t26"
## t27 "Order_1" "Family_1" "Genus_4" "t27"
## t28 "Order_1" "Family_1" "Genus_4" "t28"
## t29 "Order_1" "Family_1" "Genus_4" "t29"
## t30 "Order_1" "Family_1" "Genus_4" "t30"
## t31 "Order_1" "Family_1" "Genus_4" "t31"
## t32 "Order_1" "Family_1" "Genus_4" "t32"
## t33 "Order_1" "Family_2" "Genus_5" "t33"
## t34 "Order_1" "Family_2" "Genus_5" "t34"
## t35 "Order_1" "Family_2" "Genus_5" "t35"
## t36 "Order_1" "Family_2" "Genus_5" "t36"
## t37 "Order_1" "Family_2" "Genus_5" "t37"
## t38 "Order_1" "Family_2" "Genus_5" "t38"
## t39 "Order_1" "Family_2" "Genus_5" "t39"
## t40 "Order_1" "Family_2" "Genus_5" "t40"
## t41 "Order_1" "Family_2" "Genus_6" "t41"
## t42 "Order_1" "Family_2" "Genus_6" "t42"
## t43 "Order_1" "Family_2" "Genus_6" "t43"
## t44 "Order_1" "Family_2" "Genus_6" "t44"
## t45 "Order_1" "Family_2" "Genus_6" "t45"
## t46 "Order_1" "Family_2" "Genus_6" "t46"
## t47 "Order_1" "Family_2" "Genus_6" "t47"
## t48 "Order_1" "Family_2" "Genus_6" "t48"
## t49 "Order_1" "Family_2" "Genus_7" "t49"
## t50 "Order_1" "Family_2" "Genus_7" "t50"
## t51 "Order_1" "Family_2" "Genus_7" "t51"
## t52 "Order_1" "Family_2" "Genus_7" "t52"
## t53 "Order_1" "Family_2" "Genus_7" "t53"
## t54 "Order_1" "Family_2" "Genus_7" "t54"
## t55 "Order_1" "Family_2" "Genus_7" "t55"
## t56 "Order_1" "Family_2" "Genus_7" "t56"
## t57 "Order_1" "Family_2" "Genus_8" "t57"
## t58 "Order_1" "Family_2" "Genus_8" "t58"
## t59 "Order_1" "Family_2" "Genus_8" "t59"
## t60 "Order_1" "Family_2" "Genus_8" "t60"
## t61 "Order_1" "Family_2" "Genus_8" "t61"
## t62 "Order_1" "Family_2" "Genus_8" "t62"
## t63 "Order_1" "Family_2" "Genus_8" "t63"
## t64 "Order_1" "Family_2" "Genus_8" "t64"

In this case, that's all there is to it. If the number of labels is different in different paths then we will end up with something much messier. For instance:

tree$node.label[122-Ntip(tree)]<-"Subgenus_1"
tree$node.label[125-Ntip(tree)]<-"Subgenus_2"
plotTree(tree,fsize=0.65,ftype="i",lwd=1,xlim=c(-0.06,1.1))
nodelabels(node=which(tree$node.label!="")+Ntip(tree),
    text=tree$node.label[which(tree$node.label!="")],cex=0.7)

plot of chunk unnamed-chunk-3

T<-sapply(tree$tip.label,foo,tree=tree)
T
## $t1
## [1] "Order_1"  "Family_1" "Genus_1"  "t1"      
## 
## $t2
## [1] "Order_1"  "Family_1" "Genus_1"  "t2"      
## 
## $t3
## [1] "Order_1"  "Family_1" "Genus_1"  "t3"      
## 
## $t4
## [1] "Order_1"  "Family_1" "Genus_1"  "t4"      
## 
## $t5
## [1] "Order_1"  "Family_1" "Genus_1"  "t5"      
## 
## $t6
## [1] "Order_1"  "Family_1" "Genus_1"  "t6"      
## 
## $t7
## [1] "Order_1"  "Family_1" "Genus_1"  "t7"      
## 
## $t8
## [1] "Order_1"  "Family_1" "Genus_1"  "t8"      
## 
## $t9
## [1] "Order_1"  "Family_1" "Genus_2"  "t9"      
## 
## $t10
## [1] "Order_1"  "Family_1" "Genus_2"  "t10"     
## 
## $t11
## [1] "Order_1"  "Family_1" "Genus_2"  "t11"     
## 
## $t12
## [1] "Order_1"  "Family_1" "Genus_2"  "t12"     
## 
## $t13
## [1] "Order_1"  "Family_1" "Genus_2"  "t13"     
## 
## $t14
## [1] "Order_1"  "Family_1" "Genus_2"  "t14"     
## 
## $t15
## [1] "Order_1"  "Family_1" "Genus_2"  "t15"     
## 
## $t16
## [1] "Order_1"  "Family_1" "Genus_2"  "t16"     
## 
## $t17
## [1] "Order_1"  "Family_1" "Genus_3"  "t17"     
## 
## $t18
## [1] "Order_1"  "Family_1" "Genus_3"  "t18"     
## 
## $t19
## [1] "Order_1"  "Family_1" "Genus_3"  "t19"     
## 
## $t20
## [1] "Order_1"  "Family_1" "Genus_3"  "t20"     
## 
## $t21
## [1] "Order_1"  "Family_1" "Genus_3"  "t21"     
## 
## $t22
## [1] "Order_1"  "Family_1" "Genus_3"  "t22"     
## 
## $t23
## [1] "Order_1"  "Family_1" "Genus_3"  "t23"     
## 
## $t24
## [1] "Order_1"  "Family_1" "Genus_3"  "t24"     
## 
## $t25
## [1] "Order_1"  "Family_1" "Genus_4"  "t25"     
## 
## $t26
## [1] "Order_1"  "Family_1" "Genus_4"  "t26"     
## 
## $t27
## [1] "Order_1"  "Family_1" "Genus_4"  "t27"     
## 
## $t28
## [1] "Order_1"  "Family_1" "Genus_4"  "t28"     
## 
## $t29
## [1] "Order_1"  "Family_1" "Genus_4"  "t29"     
## 
## $t30
## [1] "Order_1"  "Family_1" "Genus_4"  "t30"     
## 
## $t31
## [1] "Order_1"  "Family_1" "Genus_4"  "t31"     
## 
## $t32
## [1] "Order_1"  "Family_1" "Genus_4"  "t32"     
## 
## $t33
## [1] "Order_1"  "Family_2" "Genus_5"  "t33"     
## 
## $t34
## [1] "Order_1"  "Family_2" "Genus_5"  "t34"     
## 
## $t35
## [1] "Order_1"  "Family_2" "Genus_5"  "t35"     
## 
## $t36
## [1] "Order_1"  "Family_2" "Genus_5"  "t36"     
## 
## $t37
## [1] "Order_1"  "Family_2" "Genus_5"  "t37"     
## 
## $t38
## [1] "Order_1"  "Family_2" "Genus_5"  "t38"     
## 
## $t39
## [1] "Order_1"  "Family_2" "Genus_5"  "t39"     
## 
## $t40
## [1] "Order_1"  "Family_2" "Genus_5"  "t40"     
## 
## $t41
## [1] "Order_1"  "Family_2" "Genus_6"  "t41"     
## 
## $t42
## [1] "Order_1"  "Family_2" "Genus_6"  "t42"     
## 
## $t43
## [1] "Order_1"  "Family_2" "Genus_6"  "t43"     
## 
## $t44
## [1] "Order_1"  "Family_2" "Genus_6"  "t44"     
## 
## $t45
## [1] "Order_1"  "Family_2" "Genus_6"  "t45"     
## 
## $t46
## [1] "Order_1"  "Family_2" "Genus_6"  "t46"     
## 
## $t47
## [1] "Order_1"  "Family_2" "Genus_6"  "t47"     
## 
## $t48
## [1] "Order_1"  "Family_2" "Genus_6"  "t48"     
## 
## $t49
## [1] "Order_1"  "Family_2" "Genus_7"  "t49"     
## 
## $t50
## [1] "Order_1"  "Family_2" "Genus_7"  "t50"     
## 
## $t51
## [1] "Order_1"  "Family_2" "Genus_7"  "t51"     
## 
## $t52
## [1] "Order_1"  "Family_2" "Genus_7"  "t52"     
## 
## $t53
## [1] "Order_1"  "Family_2" "Genus_7"  "t53"     
## 
## $t54
## [1] "Order_1"  "Family_2" "Genus_7"  "t54"     
## 
## $t55
## [1] "Order_1"  "Family_2" "Genus_7"  "t55"     
## 
## $t56
## [1] "Order_1"  "Family_2" "Genus_7"  "t56"     
## 
## $t57
## [1] "Order_1"    "Family_2"   "Genus_8"    "Subgenus_1" "t57"       
## 
## $t58
## [1] "Order_1"    "Family_2"   "Genus_8"    "Subgenus_1" "t58"       
## 
## $t59
## [1] "Order_1"    "Family_2"   "Genus_8"    "Subgenus_1" "t59"       
## 
## $t60
## [1] "Order_1"    "Family_2"   "Genus_8"    "Subgenus_1" "t60"       
## 
## $t61
## [1] "Order_1"    "Family_2"   "Genus_8"    "Subgenus_2" "t61"       
## 
## $t62
## [1] "Order_1"    "Family_2"   "Genus_8"    "Subgenus_2" "t62"       
## 
## $t63
## [1] "Order_1"    "Family_2"   "Genus_8"    "Subgenus_2" "t63"       
## 
## $t64
## [1] "Order_1"    "Family_2"   "Genus_8"    "Subgenus_2" "t64"

Perhaps we could still post-process this - but it would obviously be more difficult. Note that there is no difficult if genera are at different temporal depths in the tree - so longer as there is the same number of named levels between the root & any tip.

That's all.

Monday, March 30, 2015

Phylogenetic PCA 'biplot' with choices argument

A phytools user recently asked the following:

“I am trying to plot the results of a phylogenetic PCA generated with your package Phytools. I would especially like to plot the third and fourth components using the "choices” option of the biplot() function in R. My feeling is that this option is not implemented while using your phyl.pca function since I get the message "choices" is not a graphical parameter. Am I right? Is there a way I can plot other components than the default ones?“

In fact, this is correct. Here is the function code for biplot.phyl.pca (originally suggested to me by Joan Maspons):

library(phytools)
biplot.phyl.pca
## function (x, ...) 
## {
##     biplot(x$S, x$L, ...)
## }
## <environment: namespace:phytools>
tree<-pbtree(n=26,tip.label=LETTERS)
X<-fastBM(tree,nsim=4)
obj<-phyl.pca(tree,X)
obj
## Phylogenetic pca
## Starndard deviations:
##       PC1       PC2       PC3       PC4 
## 1.0959134 0.9057384 0.8286637 0.6021671 
## Loads:
##              PC1        PC2        PC3          PC4
## [1,]  0.45746491  0.3875983  0.2376398 -0.764212515
## [2,] -0.05002709  0.3353159 -0.9366821 -0.087676734
## [3,] -0.68744007 -0.6538311 -0.1233330 -0.291066912
## [4,] -0.81927353  0.5421294  0.1867339  0.004131128
biplot(obj)

plot of chunk unnamed-chunk-1

biplot(obj,choices=c(3,4)) ## doesn't work
## Warning in plot.window(...): "choices" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "choices" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "choices" is
## not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "choices" is
## not a graphical parameter
## Warning in box(...): "choices" is not a graphical parameter
## Warning in title(...): "choices" is not a graphical parameter
## Warning in text.default(x, xlabs, cex = cex[1L], col = col[1L], ...):
## "choices" is not a graphical parameter
## Warning in plot.window(...): "choices" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "choices" is not a graphical parameter
## Warning in title(...): "choices" is not a graphical parameter
## Warning in axis(3, col = col[2L], ...): "choices" is not a graphical
## parameter
## Warning in axis(4, col = col[2L], ...): "choices" is not a graphical
## parameter
## Warning in text.default(y, labels = ylabs, cex = cex[2L], col = col[2L], :
## "choices" is not a graphical parameter

plot of chunk unnamed-chunk-1

It is non-trivial, but fairly straightforward, to modify this to permit use of the argument choices to select PC axes to be plotted. For this I made use of the do.call function:

biplot.phyl.pca<-function(x,...){
    to.do<-list(...)
    if(hasArg(choices)){ 
        choices<-list(...)$choices
        to.do$choices<-NULL
    } else choices<-c(1,2)
    to.do$x<-x$S[,choices]
    to.do$y<-x$L[,choices]
    do.call(biplot,to.do)
}

Let's try it:

biplot(obj) ## standard biplot

plot of chunk unnamed-chunk-3

biplot(obj,choices=c(3,4))

plot of chunk unnamed-chunk-3

That's it. Assuming I don't find (and no one points out) an error in this, I will add this to the next version of phytools.

Friday, March 27, 2015

Speed-up for collapseTree, and dev.hold to prevent plot animation 'blinking'

In the past 24 hrs or so I have done a few things to speed up the new phytools function collapseTree. This function gives an animated, interactive tree plotter through which users can collapse & expand subtrees of the phylogeny to & from their ancestral nodes. Code of this new version is here.

Since the animation requires that the tree be replotted many times in succession, most of the speed-ups I mentioned involves vectorizing various actions, as well as avoiding, where ever possible, the need to recalculate quantities. Specifically, I vectorized the drawing of edges & arcs on the tree (using segments & draw.arc, respectively), and I allow internally used plotting function, fan to accepted precalculated values for the "pruningwise" reordered "phylo" object and for the node heights of the tree, since these must be recomputed every time the same tree is plotted.

I also created a new, internally used function called circles to replace nodelabels from the ape package for the simple task of plotting circles at the nodes & tips of the phylogeny.

Finally, particularly when the tree is very large, I found that the animation would sometimes “blink” when running. This is because there is a discernible gap in time between different parts of the tree and node labels being plotted. To eliminate that phenomenon, I now use the function dev.hold to hold the elements being sent to the plotting device, and dev.flush to flush them.

This actually may turn out to be a handy feature to add to other plotting functions in the phytools package. For instance, compare (on your own machine, of course):

library(phytools)
tree<-pbtree(n=100)
x<-fastBM(tree)
contMap(tree,x,ftype="off") ## appears gradually

plot of chunk unnamed-chunk-1

dev.hold()
## [1] 0
contMap(tree,x,ftype="off")

plot of chunk unnamed-chunk-2

dev.flush() ## appears all at once
## [1] 0

Finally, here is another video showing the use of collapseTree:

packageVersion("phytools")
## [1] '0.4.51'
tree<-pbtree(n=150)
## don't run
# pruned<-collapseTree(tree)

Cool - no blinking!