I've been gradually plugging away at Rphylip (my R wrapper for PHYLIP), which now includes working interfaces for contml, contrast, dnaml, dnamlk, dnapars, neighbor, threshml, and treedist. Obviously, I have a long way to go; however some of the new additions cover functionality that can't be found in any other software. For example, Rcontrast (interface for contrast) does regular phylogenetic contrasts - but also allows for within-species sampling (Felsenstein 2008). Rthreshml (interface for threshml, which is not actually yet in PHYLIP but will be in the next release) implements the threshold model which can be used for estimating the correlation between discrete and continuous characters, among other things (Felsenstein 2012).

Here's a quick demo of Rthreshml:

Loading required package: ape

> require(phytools)

Loading required package: phytools

Loading required package: maps

Loading required package: rgl

> tree<-pbtree(n=100)

> V<-matrix(c(1,0,0.8,0,

0,2,0,1.2,

0.8,0,1,0,

0,1.2,0,1),4,4)

> V

[,1] [,2] [,3] [,4]

[1,] 1.0 0.0 0.8 0.0

[2,] 0.0 2.0 0.0 1.2

[3,] 0.8 0.0 1.0 0.0

[4,] 0.0 1.2 0.0 1.0

> X<-sim.corrs(tree,V)

> th<-setNames(c(0,Inf),LETTERS[1:2])

> X<-data.frame(X[,1],X[,2],sapply(X[,3],threshState,th), sapply(X[,4],threshState,th))

> names(X)<-paste("v",1:ncol(X),sep="")

> X

v1 v2 v3 v4

t15 2.8841985 4.98211106 B B

t26 2.1335369 7.73244895 A B

t27 0.9770008 6.58944668 A B

t44 3.5651266 4.94113242 B B

t85 1.6813201 3.73859795 A B

t86 1.3738674 3.36676417 A B

t69 1.1805336 2.17284155 A B

t48 2.2652202 3.93811812 B B

....

> fit<-Rthreshml(tree,X,proposal=0.1)

....

Threshold character Maximum Likelihood method version 3.7a

....

Markov chain Monte Carlo (MCMC) estimation of covariances:

Acceptance Norm of change

Chains (20) rate in transform

------ ---------- --------------

Burn-in: 1000 updates

Chain 1: 100000 updates ........ 0.8623 0.339614

Chain 2: 100000 updates ........ 0.8651 0.215640

Chain 3: 100000 updates ........ 0.8674 0.160055

Chain 4: 100000 updates ....

....

Covariance matrix of continuous characters

and liabilities of discrete characters

(the continuous characters are first)

1 2 3 4

1 0.94584 0.09582 0.73779 -0.15732

2 0.09582 2.07466 -0.17442 1.13575

3 0.73779 -0.17442 1.00000 -0.31895

4 -0.15732 1.13575 -0.31895 1.00000

....

> fit

$Covariance_matrix

v1 v2 v3 v4

v1 0.94584 0.09582 0.73779 -0.15732

v2 0.09582 2.07466 -0.17442 1.13575

v3 0.73779 -0.17442 1.00000 -0.31895

v4 -0.15732 1.13575 -0.31895 1.00000

$Transform_indepvar_liab

v1 v2 v3 v4

v1 0.97254 0.00000 0.00000 0.00000

v2 0.09853 1.43699 0.00000 0.00000

v3 0.75862 -0.17339 0.62804 0.00000

v4 -0.16177 0.80146 -0.09118 0.56849

$Var_change

v1 v2 v3 v4

2.859484 1.708908 0.238306 0.213801

$Transform_liab_cont

v1 v2 v3 v4

v1 -0.08096 0.81423 -0.20088 0.53863

v2 -0.69130 -0.26046 -0.67287 0.03887

v3 -0.08890 -0.47040 0.32064 0.81732

v4 0.71249 -0.21889 -0.63568 0.20090

We can see that the estimated covariance matrix is actually very close to our generating matrix. Cool.

Unlike the threshml standalone, we can provide our discrete & continuous characters in any order, and our discrete character can be coded anyway we want - so long as it only has two states. Rthreshml creates the input file, reads back in the output, and sorts the estimated covariance matrix back into the order of the columns of X.

This comment has been removed by the author.

ReplyDeleteThanks for making this useful package!

ReplyDeleteDaniel

Hi Daniel.

DeleteDid you get your previous issue (deleted comment, above) resolved. Turns out that there is a bug in Rthreshml for all continuous or all discrete characters. It comes when the output from THRESHML is read back into R. This is now fixed & the updated Rphylip can be downloaded & installed from source here. Let me know if that works.

All the best, Liam

BTW, this was a very helpful report. Thank you.

DeleteHi Liam,

DeleteYes, I figured out my problem. I had given the argument types="discrete" instead of types=c("discrete","discrete"). After I fixed that Rthreshml would read in the output from threshml. However, I did notice that the covariance matrix attribute would be just NAs instead of the values printed from thresml. That is fixed in the version you just posted. Thanks!

Daniel

Yes! That is what I discovered too due to your bug report. I think it would be reasonable to allow users to specify types="discrete" or types="continuous" if all characters are of the same type. I will add that to the next version. Thanks! Liam

DeleteThis comment has been removed by the author.

ReplyDeleteHello! I'm analyzing my continuous-discrete data in Rthreshml. I've completed on analysis, but got an odd result: the coefficient obtained from the covariance matrix was -1.39201. I've never heard of these coefficients exceeding 1 or -1. I'm hoping that someone might be able to tell me what could be causing this. Other than increasing burnin and generations, I only changed the proposal size to 0.7 to get acceptance rates between 0.2-0.5.

ReplyDelete