I keep plugging away at the Rphylip project. I have now written R interfaces covering 15* of the 35* programs in the PHYLIP package (*including threshml, which is technically not yet in PHYLIP), as well as a number of helper functions. The latest two added to that list are protpars (for MP tree inference from protein sequences) and protdist (for evolutionary distance estimation from protein sequences). The following is a quick demo of the latter:
Loading required package: Rphylip
Loading required package: ape
> packageVersion("Rphylip")
[1] ‘0.1.9’
> data(chloroplast)
> chloroplast
19 protein sequences in character format stored in a matrix.
All sequences of same length: 5144
Labels: Trico Nostoc Syn6301 Prochl Syn8102 Thermo ...
Amino acid composition:
A C D E F G H I K L
0.087 0.007 0.040 0.047 0.051 0.091 0.029 0.074 0.040 0.102
M N P Q R S T V W Y
0.025 0.036 0.048 0.040 0.052 0.054 0.052 0.073 0.022 0.032
> D<-Rprotdist(chloroplast,model="PAM")
...
Protein distance algorithm, version 3.695
Settings for this run:
P Use JTT, PMB, PAM, Kimura, categories model? Dayhoff PAM matrix
G Gamma distribution of rates among positions? No
C One category of substitution rates? Yes
W Use weights for positions? No
M Analyze multiple data sets? No
I Input sequences interleaved? Yes
0 Terminal type (IBM PC, ANSI)? IBM PC
1 Print out the data at start of run No
2 Print indications of progress of run Yes
Are these settings correct? (type Y or the letter for one to change)
Computing distances:
1
2 .
3 ..
4 ...
5 ....
6 .....
7 ......
8 .......
9 ........
10 .........
11 ..........
12 ...........
13 ............
14 .............
15 ..............
16 ...............
17 ................
18 .................
19 ..................
Output written to file "outfile"
Done.
Press enter to quit.
> tree<-Rneighbor(D)
...
Neighbor-Joining/UPGMA method version 3.695
Neighbor-joining method
Negative branch lengths allowed
+----7
!
! +-------9
! +-8
! ! ! +---12
! +-9 +-5
! ! ! +--10
! ! !
! ! +--------11
! !
! +-10 +-----13
! ! ! +-6
! ! ! ! ! +-------14
! ! ! ! +-4
! ! +-7 +------19
! +-11 !
! ! ! ! +--17
! ! ! +--2
! +-13 ! +----15
! ! ! !
! ! ! +-----18
! ! !
! +-14 +----------8
! ! !
! ! ! +-----4
! ! ! +--1
! +-15 +-12 +---5
! ! ! !
! ! ! +--3
17-16 !
! ! +----6
! !
! ! +-16
! +--3
! +-2
!
+----1
remember: this is an unrooted tree!
...
Translation table
-----------------
1 Trico
2 Nostoc
3 Syn6301
4 Prochl
5 Syn8102
6 Thermo
7 Syn6803
8 Gloeo
9 Odont
10 Porph
11 Cyanid
12 Gracil
13 Nephros
14 Chlamy
15 Arabid
16 Anabae
17 March
18 Cyanoph
19 Chlorel
> plot(tree,type="unrooted",no.margin=TRUE,edge.width=2)
Unfortunately - as I don't know where these data came from, I can't tell whether this is a good tree, a bad tree, or an average tree; however I believe that it is quite similar to the tree obtained using Rproml.
Hello! Fist of all, congratulations for the amazing package! I have two questions:
ReplyDeletedoes rprotodist only accept the phyDat format?
do you know of any tutorial for getting a phyDat format from a alignment format? i.e. those comming from msaMuscle ?
Thank you very much!
I think I found the solution here:
ReplyDeletehttps://rdrr.io/bioc/msa/man/msaConvert.html
from msa itself.
Strangly, I got the error:
ReplyDeletesh: 1: anaconda3/bin/protdist/protdist: not found
even that the path is correct
should be using:
ReplyDeleteanaconda3/bin/
thanks!
Looks like you got it figured out!
ReplyDelete