Tuesday, February 18, 2014

New Rphylip functions: Rgendist & Rdolpenny

Rphylip, an R interface to the PHYLIP software package by Joe Felsenstein (1989, 2013), reached an entirely meaningless milestone today: it now contains R interfaces for exactly* 2/3 of the programs in the PHYLIP package. (*By some accounting. I have included the program THRESHML, which is not yet part of PHYLIP; and there is also a couple of programs that have been counted for which we will probably not write interfaces.) That means that Rphylip now has 24 different interface functions, as well as a number of other helper functions (including some that add new functionality, such as opt.Rdnaml.) The two latest additions to the family are Rdolpenny (an interface for DOLPENNY) and Rgendist (an interface for GENDIST).

Here's a quick demo of the latter, Rgendist, which is for the calculation of genetic distances from gene frequency data. The data used here (in X) are from the test data in the GENDIST documentation:

> X
           locus 1 locus 2 locus 3 locus 4 locus 5 locus 6
European    0.2868  0.5684  0.4422  0.4286  0.3828  0.7285
African     0.1356  0.4840  0.0602  0.0397  0.5977  0.9675
Chinese     0.1628  0.5958  0.7298  1.0000  0.3811  0.7986
American    0.0144  0.6990  0.3280  0.7421  0.6606  0.8603
Australian  0.1211  0.2274  0.5821  1.0000  0.2018  0.9000
           locus 7 locus 8 locus 9 locus 10
European    0.6386  0.0205  0.8055   0.5043
African     0.9511  0.0600  0.7582   0.6207
Chinese     0.7782  0.0726  0.7482   0.7334
American    0.7924  0.0000  0.8086   0.8636
Australian  0.9837  0.0396  0.9097   0.2976
> Dnei<-Rgendist(X)

....

Genetic Distance Matrix program, version 3.695

....

Distances calculated for species
    1           
    2            .
    3            ..
    4            ...
    5            ....

Distances written to file "outfile"

Done.

> Dnei
           European  African  Chinese American
African    0.078002                          
Chinese    0.080749 0.234698                 
American   0.066805 0.104975 0.053879        
Australian 0.103014 0.227281 0.063275 0.134756

> # Cavalli-Sforza (1967) distances
> Dcavalli<-Rgendist(X,method="Cavalli-Sforza")

....

Genetic Distance Matrix program, version 3.695

....

Distances calculated for species
    1           
    2            .
    3            ..
    4            ...
    5            ....

Distances written to file "outfile"

Done.

> Dcavalli
           European  African  Chinese American
African    0.181749                          
Chinese    0.181987 0.480537                 
American   0.129497 0.231519 0.147522        
Australian 0.260814 0.480491 0.123618 0.283144

An input matrix, as shown above, is only one of two ways that the data can be sent to Rgendist. The user can also supply a list of matrices in which each matrix contains the frequencies of each allele at a single locus (and thus the length of the list is equal to the number of loci in the analysis). See the function documentation for more information.

All the latest work on Rphylip (including package builds) can be obtained from GitHub.

That's all for now.

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.