Sunday, December 15, 2013

Rphylip: protpars & protdist

I keep plugging away at the Rphylip project. I have now written R interfaces covering 15* of the 35* programs in the PHYLIP package (*including threshml, which is technically not yet in PHYLIP), as well as a number of helper functions. The latest two added to that list are protpars (for MP tree inference from protein sequences) and protdist (for evolutionary distance estimation from protein sequences). The following is a quick demo of the latter:

> require(Rphylip)
Loading required package: Rphylip
Loading required package: ape
> packageVersion("Rphylip")
[1] ‘0.1.9’
> data(chloroplast)
> chloroplast
19 protein sequences in character format stored in a matrix.

All sequences of same length: 5144

Labels: Trico Nostoc Syn6301 Prochl Syn8102 Thermo ...

Amino acid composition:
    A     C     D     E     F     G     H     I     K     L
0.087 0.007 0.040 0.047 0.051 0.091 0.029 0.074 0.040 0.102
    M     N     P     Q     R     S     T     V     W     Y
0.025 0.036 0.048 0.040 0.052 0.054 0.052 0.073 0.022 0.032

> D<-Rprotdist(chloroplast,model="PAM")

...

Protein distance algorithm, version 3.695

Settings for this run:
  P  Use JTT, PMB, PAM, Kimura, categories model?  Dayhoff PAM matrix
  G  Gamma distribution of rates among positions?  No
  C           One category of substitution rates?  Yes
  W                    Use weights for positions?  No
  M                   Analyze multiple data sets?  No
  I                  Input sequences interleaved?  Yes
  0                 Terminal type (IBM PC, ANSI)?  IBM PC
  1            Print out the data at start of run  No
  2          Print indications of progress of run  Yes

Are these settings correct? (type Y or the letter for one to change)

Computing distances:
  1           
  2            .
  3            ..
  4            ...
  5            ....
  6            .....
  7            ......
  8            .......
  9            ........
  10           .........
  11           ..........
  12           ...........
  13           ............
  14           .............
  15           ..............
  16           ...............
  17           ................
  18           .................
  19           ..................

Output written to file "outfile"

Done.

Press enter to quit.

> tree<-Rneighbor(D)

...

Neighbor-Joining/UPGMA method version 3.695


Neighbor-joining method

Negative branch lengths allowed


  +----7        
  ! 
  !                     +-------9        
  !                   +-8
  !                   ! ! +---12       
  !                 +-9 +-5
  !                 ! !   +--10       
  !                 ! !
  !                 ! +--------11       
  !                 ! 
  !              +-10   +-----13       
  !              !  ! +-6
  !              !  ! ! ! +-------14       
  !              !  ! ! +-4
  !              !  +-7   +------19       
  !           +-11    !
  !           !  !    !  +--17       
  !           !  !    +--2
  !        +-13  !       +----15       
  !        !  !  ! 
  !        !  !  +-----18       
  !        !  ! 
  !     +-14  +----------8        
  !     !  ! 
  !     !  !     +-----4        
  !     !  !  +--1
  !  +-15  +-12  +---5        
  !  !  !     ! 
  !  !  !     +--3        
 17-16  ! 
  !  !  +----6        
  !  ! 
  !  !  +-16       
  !  +--3
  !     +-2        
  ! 
  +----1        


remember: this is an unrooted tree!

...

Translation table
-----------------
        1       Trico
        2       Nostoc
        3       Syn6301
        4       Prochl
        5       Syn8102
        6       Thermo
        7       Syn6803
        8       Gloeo
        9       Odont
        10      Porph
        11      Cyanid
        12      Gracil
        13      Nephros
        14      Chlamy
        15      Arabid
        16      Anabae
        17      March
        18      Cyanoph
        19      Chlorel

> plot(tree,type="unrooted",no.margin=TRUE,edge.width=2)

Unfortunately - as I don't know where these data came from, I can't tell whether this is a good tree, a bad tree, or an average tree; however I believe that it is quite similar to the tree obtained using Rproml.

Rphylip is available here and on github.

5 comments:

  1. Hello! Fist of all, congratulations for the amazing package! I have two questions:

    does rprotodist only accept the phyDat format?
    do you know of any tutorial for getting a phyDat format from a alignment format? i.e. those comming from msaMuscle ?

    Thank you very much!

    ReplyDelete
  2. I think I found the solution here:

    https://rdrr.io/bioc/msa/man/msaConvert.html

    from msa itself.

    ReplyDelete
  3. Strangly, I got the error:

    sh: 1: anaconda3/bin/protdist/protdist: not found

    even that the path is correct

    ReplyDelete
  4. should be using:

    anaconda3/bin/


    thanks!

    ReplyDelete

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.