Saturday, November 30, 2013

New project - Rphylip: an R interface for PHYLIP

Just towards the end of this week I started working on a new project, Rphylip, which is designed to provide an R interface for the longstanding, popular phylogenetics package PHYLIP. PHYLIP does a really remarkable number of different analysis - from a wide range of phylogeny inference methods, to a variety of comparative methods (including several important new approaches that are implemented nowhere else), to tree drawing and other things. A complete list of the programs in the PHYLIP package is here.

My goal is to create functions that interface with the programs of PHYLIP, but allow the user to operate completely within R. That is, all inputs for the PHYLIP programs are created by R; and all outputs are parsed back into R or printed to the R console. Finally, Rphylip will even clean up the input & output files it has created for the PHYLIP programs.

So far, I have created test functions to interface with only a few of the programs of PHYLIP: Rdnaml (for dnaml); Rcontml (for contml; and Rcontrast (for contrast).

To use Rphylip one has to, of course, first install PHYLIP on your computer. Having done this, it is straightforward to run any of the programs of PHYLIP through R. Below is a simple demo using Rdnaml under the default conditions (which are slightly different then the defaults for dnaml). Some output excluded:

> require(Rphylip)
Loading required package: Rphylip
Loading required package: ape
> X<-read.dna("primates.dna")
> X
12 DNA sequences in binary format stored in a matrix.

All sequences of same length: 232
Labels: Lemur Tarsier Sq.Monkey J.Macaque R.Macaque E.Macaque ...

Base composition:
    a     c     g     t
0.364 0.414 0.041 0.181

> tree<-Rdnaml(X,path="C:/Users/Liam/phylip/exe")
Warning:
  One or more of "infile", "outfile", "outtree"
  was found in your current working directory and may be overwritten

Press ENTER to continue or q to quit:

...

Nucleic acid sequence Maximum Likelihood method, version 3.695

...

               +--------------3
               | 
               |          +--------8
               |          | 
               |          |          +-----11
  +------------8    +-----2       +--1 
  |            |    |     |    +--3  +--12
  |            |    |     |    |  | 
  |            |    |     +----5  +----10 
  |            |    |          | 
  |            +----4          +--------9 
  |                 | 
  |                 |              +4    
  |                 |           +-10 
  |                 |        +--6  +--5  
  |                 |        |  | 
  |                 +--------7  +--6 
  |                          | 
  |                          +------7 
  | 
  9-------------------2        
  | 
  +------------------1

remember: this is an unrooted tree!
...

Translation table
-----------------
        1       Lemur
        2       Tarsier
        3       Sq.Monkey
        4       J.Macaque
        5       R.Macaque
        6       E.Macaque
        7       B.Macaque
        8       Gibbon
        9       Orangutan
        10      Gorilla
        11      Chimp
        12      Human

> tree$logLik
[1] -2200.766
> require(phangorn)
Loading required package: phangorn
Loading required package: rgl
> plot(midpoint(tree),edge.width=2,no.margin=TRUE)

The source code for the functions I've written so far in Rphylip are here and I posted a little package build with some minimal documentation for the three interface functions here: http://www.phytools.org/Rphylip/. Feedback welcome!

3 comments:

  1. Rthanks, Liam. I know that this is a lot of work -- having interface features to handle the options of PHYLIP.

    In the next (4.0) release of PHYLIP we will have Java GUI interfaces for all the programs, but they will continue also to work standalone with the same character-oriented menu and the same options letters. So hopefully they will continue to work with your interface. However all this will be delayed, as my last programmer leaves Friday, and unless our grant situation improves (knock wood) I will be completing the release myself, slowly.

    ReplyDelete
  2. Hi Liam,

    I'm trying to do Rconstrast with a multivariate dataset, and I would like to turn on Option C to return the actual contrasts. Is there a way to access the letter options as arguments when running the functions in Rphylip?

    ReplyDelete

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.