Thursday, December 5, 2013

More on Rphylip: Interfaces for threshml & treedist

I've been gradually plugging away at Rphylip (my R wrapper for PHYLIP), which now includes working interfaces for contml, contrast, dnaml, dnamlk, dnapars, neighbor, threshml, and treedist. Obviously, I have a long way to go; however some of the new additions cover functionality that can't be found in any other software. For example, Rcontrast (interface for contrast) does regular phylogenetic contrasts - but also allows for within-species sampling (Felsenstein 2008). Rthreshml (interface for threshml, which is not actually yet in PHYLIP but will be in the next release) implements the threshold model which can be used for estimating the correlation between discrete and continuous characters, among other things (Felsenstein 2012).

Here's a quick demo of Rthreshml:

> library(Rphylip)
Loading required package: ape
> require(phytools)
Loading required package: phytools
Loading required package: maps
Loading required package: rgl
> tree<-pbtree(n=100)
> V<-matrix(c(1,0,0.8,0,
0,2,0,1.2,
0.8,0,1,0,
0,1.2,0,1),4,4)
> V
     [,1] [,2] [,3] [,4]
[1,]  1.0  0.0  0.8  0.0
[2,]  0.0  2.0  0.0  1.2
[3,]  0.8  0.0  1.0  0.0
[4,]  0.0  1.2  0.0  1.0
> X<-sim.corrs(tree,V)
> th<-setNames(c(0,Inf),LETTERS[1:2])
> X<-data.frame(X[,1],X[,2],sapply(X[,3],threshState,th), sapply(X[,4],threshState,th))
> names(X)<-paste("v",1:ncol(X),sep="")
> X
             v1          v2 v3 v4
t15   2.8841985  4.98211106  B  B
t26   2.1335369  7.73244895  A  B
t27   0.9770008  6.58944668  A  B
t44   3.5651266  4.94113242  B  B
t85   1.6813201  3.73859795  A  B
t86   1.3738674  3.36676417  A  B
t69   1.1805336  2.17284155  A  B
t48   2.2652202  3.93811812  B  B
....

> fit<-Rthreshml(tree,X,proposal=0.1)

....

Threshold character Maximum Likelihood method version 3.7a

....

Markov chain Monte Carlo (MCMC) estimation of covariances:

                                 Acceptance   Norm of change
Chains (20)                         rate        in transform
------                          ----------    --------------

Burn-in: 1000 updates
Chain 1: 100000 updates  ........  0.8623         0.339614
Chain 2: 100000 updates  ........  0.8651         0.215640
Chain 3: 100000 updates  ........  0.8674         0.160055
Chain 4: 100000 updates  ....

....

Covariance matrix of continuous characters
and liabilities of discrete characters
(the continuous characters are first)

             1          2          3          4   
  1       0.94584    0.09582    0.73779   -0.15732
  2       0.09582    2.07466   -0.17442    1.13575
  3       0.73779   -0.17442    1.00000   -0.31895
  4      -0.15732    1.13575   -0.31895    1.00000

....

> fit
$Covariance_matrix
         v1       v2       v3       v4
v1  0.94584  0.09582  0.73779 -0.15732
v2  0.09582  2.07466 -0.17442  1.13575
v3  0.73779 -0.17442  1.00000 -0.31895
v4 -0.15732  1.13575 -0.31895  1.00000

$Transform_indepvar_liab
         v1       v2       v3      v4
v1  0.97254  0.00000  0.00000 0.00000
v2  0.09853  1.43699  0.00000 0.00000
v3  0.75862 -0.17339  0.62804 0.00000
v4 -0.16177  0.80146 -0.09118 0.56849

$Var_change
      v1       v2       v3       v4
2.859484 1.708908 0.238306 0.213801

$Transform_liab_cont
         v1       v2       v3      v4
v1 -0.08096  0.81423 -0.20088 0.53863
v2 -0.69130 -0.26046 -0.67287 0.03887
v3 -0.08890 -0.47040  0.32064 0.81732
v4  0.71249 -0.21889 -0.63568 0.20090

We can see that the estimated covariance matrix is actually very close to our generating matrix. Cool.

Unlike the threshml standalone, we can provide our discrete & continuous characters in any order, and our discrete character can be coded anyway we want - so long as it only has two states. Rthreshml creates the input file, reads back in the output, and sorts the estimated covariance matrix back into the order of the columns of X.

8 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Thanks for making this useful package!

    Daniel

    ReplyDelete
    Replies
    1. Hi Daniel.

      Did you get your previous issue (deleted comment, above) resolved. Turns out that there is a bug in Rthreshml for all continuous or all discrete characters. It comes when the output from THRESHML is read back into R. This is now fixed & the updated Rphylip can be downloaded & installed from source here. Let me know if that works.

      All the best, Liam

      Delete
    2. BTW, this was a very helpful report. Thank you.

      Delete
    3. Hi Liam,
      Yes, I figured out my problem. I had given the argument types="discrete" instead of types=c("discrete","discrete"). After I fixed that Rthreshml would read in the output from threshml. However, I did notice that the covariance matrix attribute would be just NAs instead of the values printed from thresml. That is fixed in the version you just posted. Thanks!

      Daniel

      Delete
    4. Yes! That is what I discovered too due to your bug report. I think it would be reasonable to allow users to specify types="discrete" or types="continuous" if all characters are of the same type. I will add that to the next version. Thanks! Liam

      Delete
  3. This comment has been removed by the author.

    ReplyDelete
  4. Hello! I'm analyzing my continuous-discrete data in Rthreshml. I've completed on analysis, but got an odd result: the coefficient obtained from the covariance matrix was -1.39201. I've never heard of these coefficients exceeding 1 or -1. I'm hoping that someone might be able to tell me what could be causing this. Other than increasing burnin and generations, I only changed the proposal size to 0.7 to get acceptance rates between 0.2-0.5.

    ReplyDelete