Sunday, December 15, 2013

Rphylip: protpars & protdist

I keep plugging away at the Rphylip project. I have now written R interfaces covering 15* of the 35* programs in the PHYLIP package (*including threshml, which is technically not yet in PHYLIP), as well as a number of helper functions. The latest two added to that list are protpars (for MP tree inference from protein sequences) and protdist (for evolutionary distance estimation from protein sequences). The following is a quick demo of the latter:

> require(Rphylip)
Loading required package: Rphylip
Loading required package: ape
> packageVersion("Rphylip")
[1] ‘0.1.9’
> data(chloroplast)
> chloroplast
19 protein sequences in character format stored in a matrix.

All sequences of same length: 5144

Labels: Trico Nostoc Syn6301 Prochl Syn8102 Thermo ...

Amino acid composition:
    A     C     D     E     F     G     H     I     K     L
0.087 0.007 0.040 0.047 0.051 0.091 0.029 0.074 0.040 0.102
    M     N     P     Q     R     S     T     V     W     Y
0.025 0.036 0.048 0.040 0.052 0.054 0.052 0.073 0.022 0.032

> D<-Rprotdist(chloroplast,model="PAM")

...

Protein distance algorithm, version 3.695

Settings for this run:
  P  Use JTT, PMB, PAM, Kimura, categories model?  Dayhoff PAM matrix
  G  Gamma distribution of rates among positions?  No
  C           One category of substitution rates?  Yes
  W                    Use weights for positions?  No
  M                   Analyze multiple data sets?  No
  I                  Input sequences interleaved?  Yes
  0                 Terminal type (IBM PC, ANSI)?  IBM PC
  1            Print out the data at start of run  No
  2          Print indications of progress of run  Yes

Are these settings correct? (type Y or the letter for one to change)

Computing distances:
  1           
  2            .
  3            ..
  4            ...
  5            ....
  6            .....
  7            ......
  8            .......
  9            ........
  10           .........
  11           ..........
  12           ...........
  13           ............
  14           .............
  15           ..............
  16           ...............
  17           ................
  18           .................
  19           ..................

Output written to file "outfile"

Done.

Press enter to quit.

> tree<-Rneighbor(D)

...

Neighbor-Joining/UPGMA method version 3.695


Neighbor-joining method

Negative branch lengths allowed


  +----7        
  ! 
  !                     +-------9        
  !                   +-8
  !                   ! ! +---12       
  !                 +-9 +-5
  !                 ! !   +--10       
  !                 ! !
  !                 ! +--------11       
  !                 ! 
  !              +-10   +-----13       
  !              !  ! +-6
  !              !  ! ! ! +-------14       
  !              !  ! ! +-4
  !              !  +-7   +------19       
  !           +-11    !
  !           !  !    !  +--17       
  !           !  !    +--2
  !        +-13  !       +----15       
  !        !  !  ! 
  !        !  !  +-----18       
  !        !  ! 
  !     +-14  +----------8        
  !     !  ! 
  !     !  !     +-----4        
  !     !  !  +--1
  !  +-15  +-12  +---5        
  !  !  !     ! 
  !  !  !     +--3        
 17-16  ! 
  !  !  +----6        
  !  ! 
  !  !  +-16       
  !  +--3
  !     +-2        
  ! 
  +----1        


remember: this is an unrooted tree!

...

Translation table
-----------------
        1       Trico
        2       Nostoc
        3       Syn6301
        4       Prochl
        5       Syn8102
        6       Thermo
        7       Syn6803
        8       Gloeo
        9       Odont
        10      Porph
        11      Cyanid
        12      Gracil
        13      Nephros
        14      Chlamy
        15      Arabid
        16      Anabae
        17      March
        18      Cyanoph
        19      Chlorel

> plot(tree,type="unrooted",no.margin=TRUE,edge.width=2)

Unfortunately - as I don't know where these data came from, I can't tell whether this is a good tree, a bad tree, or an average tree; however I believe that it is quite similar to the tree obtained using Rproml.

Rphylip is available here and on github.

No comments:

Post a Comment