Wednesday, March 29, 2017

Deadline approaching for Córdoba, Argentina PCM workshop

Intensive short course on macroevolution and phylogenetic comparative methods in R

Applications for our short course on phylogenetic comparative methods in R to be held in Córdoba, Argentina are due in three days on April 1, 2017. The full announcement of the course follows, with the Spanish language ad below the English version.

Intensive short course on macroevolution and phylogenetic comparative methods in R

We are pleased to announce a new graduate-level intensive short course on the use of R for phylogenetic comparative analysis and downstream implementation in macroevolutionary studies. The course will be four days in length and free of charge, and will take place at the Universidad Nacional de Córdoba in Córdoba, Argentina from the 1st to the 4th of August, 2017. This course is partially funded by the National Science Foundation, with additional support from the University of Massachusetts Boston and the Universidad Nacional de Córdoba. There are a number of full stipends available to cover the cost of travel and lodging for qualified students and post-docs. Applicants are welcome from any country; however, we expect that most admitted students will come from the South American region and from other countries within Latin America. Accepted students from further afield may be offered only partial funding for their travel expenses. In addition, Argentine students coming from cities accessible to Córdoba by bus (distances of < 700 km within Argentina) may receive only lodging and be asked to cover their own travel expenses should they desire to participate in the course. Postgraduate students attending other universities in Argentina can register to receive credit for this 32 hour course with no additional cost.

Topics covered will include: an introduction to the R scientific computing environment, tree manipulation, independent contrasts and phylogenetic generalized least squares, ancestral state reconstruction, models of character evolution, diversification analysis, and visualization methods for phylogenies and comparative data. Course instructors will include Dr. Liam Revell (University of Massachusetts Boston), Dr. Luke Harmon (University of Idaho), and Dr. Mike Alfaro (University of California, Los Angeles), along with the possibility of other co-instructors yet to be announced. Course co-organizers are Dr. Santiago Benitez Vieyra (Universidad Nacional de Córdoba) and Dr. Marina Strelin (Universidad Nacional del Comahue).

Instruction in the course will be primarily in English; however some of the instructors, coordinators, and TAs of the course are competent or fluent in Spanish and English. Discussion, exercises, and activities will be conducted in both languages.

To apply for the course, please submit your CV along with a short (1 page) description of your research interests, background, and reasons for taking the course. Admission is competitive, and preference will go towards students with background in phylogenetics and a compelling motivation for taking the course. In your application please indicate your preferred travel airport, if appropriate. Applications should be submitted by email to cordoba@phytools.org by April 1st, 2017. Applications may be written in English or Spanish; however all students must have a basic working knowledge of scientific English. Questions can be directed to Drs. Liam Revell (liam.revell@umb.edu) or Marina Strelin (marina.strelin85@gmail.com).


Curso de macroevolución y uso de métodos filogenéticos comparativos en R

Nos complace anunciar un nuevo curso intensivo, con modalidad de taller, destinado a estudiantes graduados / de posgrado, acerca del uso de métodos filogenéticos comparativos en R. Estos métodos tienen muchas y diversas aplicaciones en estudios macroevolutivos. El curso será gratuito, tendrá una duración de cuatro días, y se dictará en la Universidad Nacional de Córdoba, Argentina, entre los días 1 y 4 de agosto de 2017. Este curso estará parcialmente financiado por la National Science Foundation (Estados Unidos), y contará con el apoyo adicional de la University of Massachusetts Boston y de la Universidad Nacional de Córdoba. El financiamiento cubriría los costos de los pasajes de avión y del alojamiento de los alumnos que sean aceptados en el curso, si bien la totalidad de la cobertura podría sujetarse a cambios, dependiendo de la localización geográfica de los postulantes seleccionados.

El curso se encuentra destinado a estudiantes avanzados, estudiantes de doctorado en ciencias biológicas o carreras afines, investigadores y profesionales interesados en la temática. Recibiremos solicitudes de cualquier país; sin embargo anticipamos que los postulantes sudamericanos y de otros países latinoamericanos constituirán la mayoría de los estudiantes admitidos al programa. Los estudiantes provenientes de países más lejanos que resulten elegidos tendrán la posibilidad de recibir únicamente apoyo parcial para costear sus gastos del viaje. Además, es posible que los estudiantes provenientes de ciudades argentinas accesibles a Córdoba mediante bus (distancias menos de < 700 km) reciban únicamente apoyo financiero para el alojamiento. Para estudiantes argentinos, el curso se encontrará avalado por el Doctorado en Ciencias Biológicas de la Universidad Nacional de Córdoba y tendrá una duración de 32 horas.

Los temas que serán discutidos en el curso incluyen: una introducción al ambiente computacional de R, manipulación de árboles filogenéticos, mínimos cuadrados generalizados en un contexto filogenético, reconstrucción de estados ancestrales, modelos evolutivos de rasgos, análisis de diversificación filogenética, y visualización de filogenias y datos comparativos, entre otros. El curso estará a cargo de los instructores Dr. Liam Revell (University of Massachusetts Boston), Dr. Luke Harmon (University of Idaho), y Dr. Mike Alfaro (University of California, Los Angeles), contándose con la posible participación de instructores adicionales. El Dr. Santiago Benitez-Vieyra (Universidad Nacional de Córdoba - CONICET) y la Dra. Marina Strelin (Universidad Nacional del Comahue - CONICET) serán los coordinadores de este curso.

El curso será dictado principalmente en inglés; sin embargo, algunos de los instructores, coordinadores, y ayudantes de enseñanza del curso hablan español fluido. Las discusiones, los ejercicios, y las actividades del curso se harán en español e inglés.

Los interesados en solicitar la admisión deberán enviar su currículum vitae y una descripción corta (1 página) de sus intereses científicos, experiencia, y razones por las cuales quieren tomar el curso. El proceso de admisión será competitivo, y se dará preferencia a estudiantes con conocimientos de filogenética y que estén desarrollando investigaciones relacionadas a los temas del curso. Se espera que todos los estudiantes tengan un nivel básico de inglés científico. En la solicitud debe indicarse el aeropuerto de viaje preferido (si aplica). Las solicitudes pueden estar escritas en inglés o en español y deben ser enviadas por email a cordoba@phytools.org antes del 1 abril de 2017. Preguntas adicionales pueden ser dirigidas al Dr. Liam Revell (liam.revell@umb.edu) o a la Dra. Marina Strelin (marina.strelin85@gmail.com).

Tuesday, March 28, 2017

New phytools version (>=0.6-00) to be submitted to CRAN

I'm in the process of submitting a new version of phytools to CRAN. This update will have package version number >=0.6-00. The last CRAN phytools verison is 0.5-64 and the present version has many updates relative to this prior CRAN version which was released near the beginning of December of last year.

phytools is a fairly widely used R phylogenetics package that presently shows nearly 1,100 citations on Google Scholar (Mar. 28, 2017).

While it awaits approval, note that it is always possible to obtain the latest non-CRAN phytools version from GitHub using devtools::install_github.

Here is an abbreviated list of some of the updates from the previous CRAN version of the package:

  1. A new option to permit custom color palettes in plotBranchbyTrait.
  2. A fix to phylomorphospace3d (noted in the comments) to permit a static 3D phylomorphospace plot to be created without tip labels.
  3. A simple but handy S3 as.multiPhylo method for objects of class "phylo".
  4. S3 print, coef, residuals, and plot methods for the "phyl.RMA" object class produced by phylogenetic RMA, e.g.:
library(phytools)
obj<-phyl.RMA(x,y,tree)
obj
## 
## Coefficients:
## (Intercept)           x 
##  -0.2217216   1.0878042 
## 
## VCV matrix:
##          x        y
## x 1.605286 1.394110
## y 1.394110 1.899564
## 
## Model for the covariance structure of the error is "BM"
## 
## Estimates (or set values):
##    lambda    log(L) 
##   1.00000 -60.71938 
## 
## Hypothesis test based on Clarke (1980; Biometrika):
##        r2         T        df         P 
##  0.637365  0.684671 20.199986  0.501332 
## 
## Note that the null hypothesis test is h0 = 1
plot(obj)

plot of chunk unnamed-chunk-1

  1. A bunch of new options for plotTree.barplot, including stacked bars.
  2. A bug fix for write.simmap with "multiSimmap" objects.
  3. A completely new function, ratebytree, to compare the rate of evolution of a continuous trait between 2 or more phylogenies (1, 2, 3, 4). E.g.:
ylim<-range(X)
par(mfrow=c(1,3))
phenogram(trees[[1]],X[[1]],ylim=ylim,spread.cost=c(1,0))
phenogram(trees[[2]],X[[2]],ylim=ylim,spread.cost=c(1,0))
phenogram(trees[[3]],X[[3]],ylim=ylim,spread.cost=c(1,0))

plot of chunk unnamed-chunk-2

ratebytree(trees,X)
## ML common-rate model:
##  s^2  a[1]   a[2]    a[3]    k   logL
## value    1.2487  0.6393  0.5213  -0.7999 4   -71.0761    
## 
## ML multi-rate model:
##   s^2[1] s^2[2]  s^2[3]   a[1]   a[2]    a[3]    k   logL
## value    0.425   1.747   1.1623  0.6393  0.5213  -0.7999 6   -68.3276    
## 
## Likelihood ratio: 5.497 
## P-value (based on X^2): 0.064 
## 
## R thinks it has found the ML solution.
  1. A new option to plot up or down facing trees in phytools::plotSimmap (as well as in a number of the functions that use plotSimmap internally).
  2. The option to plot up & down facing "contMap" & "densityMap" objects.
  3. Some neat functionality for interactive node labeling (more here).
  4. Another new function, errorbar.contMap, to add colorful error bars to internal nodes of a plotted "contMap" object (1, 2, 3), e.g.:
obj<-contMap(tree,y,lims=c(-3.75,4.5),plot=FALSE)
plot(obj,xlim=c(-0.05,2),legend=1)
errorbar.contMap(obj,scale.by.ci=TRUE)

plot of chunk unnamed-chunk-3

  1. Another totally new function, plotTree.errorbars, to plot a tree with semi-transparent error bars on the ages of internal nodes (1, 2).
  2. A new function, pgls.SEy, to conduct phylogenetic generalized least squares regression (or any linear modeling) while taking into account known errors in y (1, 2).
  3. An important bug fix for a non-flat prior probability distribution on the global root state in the function make.simmap for stochastic character mapping.
  4. A simple difference equation approximation OU simulation function, to simulate Ornstein-Uhlenbeck evolution on the tree for multiple optima (more here).
  5. A small update to plotTree.singletons to assign the environmental variable "last_plot.phylo" allowing this function to work with others designed to annotate plotted trees, such as ape::nodelabels.
  6. A simple fix to the S3 plot method for "phyl.pca" class objects from phylogenetic principal components analysis.
  7. A cool new plotting function, arc.cladelabels, to add clade labels to a plotted fan-style circular tree.
  8. Today I added a new function, force.ultrametric, that forces (by multiple methods) an ultrametric tree that fails ape::is.ultrametric (due to numerical precision issues) to be precisely ultrametric.
  9. Finally, I pushed a whole bunch of updates to the help pages of the package. It’s not glamorous, but it was overdue!

I'll be working on getting this package version up on CRAN today - and will post an update when it is accepted.

Updating help pages of phytools

It's not glamorous, but I just spent the past couple of hours doing significant updates to the Rd (help) pages of phytools.

To check out these updates you can look here.

More soon.

force.ultrametric method for ultrametric phylogenies that fail ape::is.ultrametric due to numerical precision

Sometimes in R a phylogenetic tree will fail the check is.ultrametric even though we know our tree is ultrametric. Why?

Generally speaking, this is because a tree in R is actually never* precisely ultrametric due to limitations in numerical precision inherent to all computer machinery. (*The exception is when our tree has integer branch lengths, in which case it could be exactly ultrametric.)

Consequently, is.ultrametric has an argument tol = .Machine$double.eps0.5 which sets the tolerance level - basically how much deviation from precise ultrametricity (is this a word?) should be permitted.

Unfortunately, sometimes trees that should be ultrametric that have been written to file & then read into R will fail this test with tol set at its default value. In addition, because the default value for tol is a function of the maximum numerical precision of the machine, this value can theoretically vary between computers & R versions!

Here is a quick example of how a tree that seems to be ultrametric can fail a check of is.ultrametric:

library(phytools)
set.seed(1)
tree<-pbtree(n=26,tip.label=LETTERS)
text<-write.tree(tree,digits=7)
text
## [1] "(A:3.759964,(B:3.382373,((((C:0.327977,D:0.327977):0.8764073,(((E:0.1249564,F:0.1249564):0.6915993,(G:0.707227,H:0.707227):0.1093287):0.2328474,(I:0.9354844,(J:0.3184016,K:0.3184016):0.6170828):0.1139187):0.1549812):0.3387382,(L:0.6247575,M:0.6247575):0.918365):1.172667,(((N:0.6491293,(O:0.08390608,P:0.08390608):0.5652232):0.4727952,((Q:0.172787,R:0.172787):0.1103943,S:0.2831814):0.8387431):0.8701228,((T:0.3105662,U:0.3105662):1.438824,(((V:0.4557458,W:0.4557458):0.02627092,X:0.4820167):0.7931008,(Y:0.2386761,Z:0.2386761):1.036441):0.4742723):0.2426576):0.7237421):0.6665833):0.3775909);"
tree<-read.tree(text=text)
plotTree(tree)

plot of chunk unnamed-chunk-1

is.ultrametric(tree)
## [1] FALSE

I don't know about you - but I have certainly seen a lot of 'ultrametric' trees written to file with 7 digit precision!

Here is a very simple function that will force our “nearly ultrametric” tree to be precisely ultrametric. It basically wraps two different methods. One is the function nnls.tree in the package phangorn by Klaus Schliep in my lab. For a given tree, this function can find the set of edge lengths with implied distances with minimum sum-of-squared differences to the true distances - in this case the patristic distances on our phylogeny.

The second method is very simple. It just computes the total amount of edge length that would have to be added to all the tips of tree to render the tree ultrametric. This seems nice, but it will concentrate the fudge we are applying to our tree edges to external edges only. I believe that this method is also implemented in a popular package called BioGeoBEARS by Nick Matzke.

Here is that function:

force.ultrametric<-function(tree,method=c("nnls","extend")){
    method<-method[1]
    if(method=="nnls") tree<-nnls.tree(cophenetic(tree),tree,
        rooted=TRUE,trace=0)
    else if(method=="extend"){
        h<-diag(vcv(tree))
        d<-max(h)-h
        ii<-sapply(1:Ntip(tree),function(x,y) which(y==x),
            y=tree$edge[,2])
        tree$edge.length[ii]<-tree$edge.length[ii]+d
    } else 
        cat("method not recognized: returning input tree\n\n")
    tree
}

Now we can try it:

library(phangorn)
ult.nnls<-force.ultrametric(tree) ## default method
is.ultrametric(ult.nnls)
## [1] TRUE
ult.extend<-force.ultrametric(tree,method="extend")
is.ultrametric(ult.extend)
## [1] TRUE

Now let's compare the edge lengths of each of these two trees, as a function of edge height, to the edge lengths of our input tree:

ult.nnls<-reorder(ult.nnls)
h.nnls<-rowMeans(nodeHeights(ult.nnls))
plot(h.nnls,tree$edge.length-ult.nnls$edge.length,pch=21,
    ylim=c(-1e-6,1e-6),bg="grey",cex=1.5,xlab="edge height",
    ylab="difference between input & output edge lengths",
    main="force.ultrametric(...,method=\"nnls\")")

plot of chunk unnamed-chunk-4

sum((tree$edge.length-ult.nnls$edge.length)^2)
## [1] 3.673085e-13
h.extend<-rowMeans(nodeHeights(ult.extend))
sum((tree$edge.length-ult.extend$edge.length)^2)
## [1] 5.2996e-12
plot(h.extend,tree$edge.length-ult.extend$edge.length,pch=21,
    ylim=c(-1e-6,1e-6),bg="grey",cex=1.5,xlab="edge height",
    ylab="difference between input & output edge lengths",
    main="force.ultrametric(...,method=\"extend\")")

plot of chunk unnamed-chunk-4

I'd say that the "nnls" method looks better.

Note that neither of these is a formal statistical method for estimating an ultrametric tree from a tree in which branch lengths are proportional to (for instance) molecular sequence evolution - there are a number of different approaches for doing this. Rather, this method is designed merely to 're-adjust' the edge lengths of a tree that has lost numerical precision from being written to file & thus fails R checks such as is.ultrametric.