I've been gradually plugging away at Rphylip (my R wrapper for PHYLIP), which now includes working interfaces for contml, contrast, dnaml, dnamlk, dnapars, neighbor, threshml, and treedist. Obviously, I have a long way to go; however some of the new additions cover functionality that can't be found in any other software. For example, Rcontrast (interface for contrast) does regular phylogenetic contrasts - but also allows for within-species sampling (Felsenstein 2008). Rthreshml (interface for threshml, which is not actually yet in PHYLIP but will be in the next release) implements the threshold model which can be used for estimating the correlation between discrete and continuous characters, among other things (Felsenstein 2012).
Here's a quick demo of Rthreshml:
Loading required package: ape
> require(phytools)
Loading required package: phytools
Loading required package: maps
Loading required package: rgl
> tree<-pbtree(n=100)
> V<-matrix(c(1,0,0.8,0,
0,2,0,1.2,
0.8,0,1,0,
0,1.2,0,1),4,4)
> V
[,1] [,2] [,3] [,4]
[1,] 1.0 0.0 0.8 0.0
[2,] 0.0 2.0 0.0 1.2
[3,] 0.8 0.0 1.0 0.0
[4,] 0.0 1.2 0.0 1.0
> X<-sim.corrs(tree,V)
> th<-setNames(c(0,Inf),LETTERS[1:2])
> X<-data.frame(X[,1],X[,2],sapply(X[,3],threshState,th), sapply(X[,4],threshState,th))
> names(X)<-paste("v",1:ncol(X),sep="")
> X
v1 v2 v3 v4
t15 2.8841985 4.98211106 B B
t26 2.1335369 7.73244895 A B
t27 0.9770008 6.58944668 A B
t44 3.5651266 4.94113242 B B
t85 1.6813201 3.73859795 A B
t86 1.3738674 3.36676417 A B
t69 1.1805336 2.17284155 A B
t48 2.2652202 3.93811812 B B
....
> fit<-Rthreshml(tree,X,proposal=0.1)
....
Threshold character Maximum Likelihood method version 3.7a
....
Markov chain Monte Carlo (MCMC) estimation of covariances:
Acceptance Norm of change
Chains (20) rate in transform
------ ---------- --------------
Burn-in: 1000 updates
Chain 1: 100000 updates ........ 0.8623 0.339614
Chain 2: 100000 updates ........ 0.8651 0.215640
Chain 3: 100000 updates ........ 0.8674 0.160055
Chain 4: 100000 updates ....
....
Covariance matrix of continuous characters
and liabilities of discrete characters
(the continuous characters are first)
1 2 3 4
1 0.94584 0.09582 0.73779 -0.15732
2 0.09582 2.07466 -0.17442 1.13575
3 0.73779 -0.17442 1.00000 -0.31895
4 -0.15732 1.13575 -0.31895 1.00000
....
> fit
$Covariance_matrix
v1 v2 v3 v4
v1 0.94584 0.09582 0.73779 -0.15732
v2 0.09582 2.07466 -0.17442 1.13575
v3 0.73779 -0.17442 1.00000 -0.31895
v4 -0.15732 1.13575 -0.31895 1.00000
$Transform_indepvar_liab
v1 v2 v3 v4
v1 0.97254 0.00000 0.00000 0.00000
v2 0.09853 1.43699 0.00000 0.00000
v3 0.75862 -0.17339 0.62804 0.00000
v4 -0.16177 0.80146 -0.09118 0.56849
$Var_change
v1 v2 v3 v4
2.859484 1.708908 0.238306 0.213801
$Transform_liab_cont
v1 v2 v3 v4
v1 -0.08096 0.81423 -0.20088 0.53863
v2 -0.69130 -0.26046 -0.67287 0.03887
v3 -0.08890 -0.47040 0.32064 0.81732
v4 0.71249 -0.21889 -0.63568 0.20090
We can see that the estimated covariance matrix is actually very close to our generating matrix. Cool.
Unlike the threshml standalone, we can provide our discrete & continuous characters in any order, and our discrete character can be coded anyway we want - so long as it only has two states. Rthreshml creates the input file, reads back in the output, and sorts the estimated covariance matrix back into the order of the columns of X.
This comment has been removed by the author.
ReplyDeleteThanks for making this useful package!
ReplyDeleteDaniel
Hi Daniel.
DeleteDid you get your previous issue (deleted comment, above) resolved. Turns out that there is a bug in Rthreshml for all continuous or all discrete characters. It comes when the output from THRESHML is read back into R. This is now fixed & the updated Rphylip can be downloaded & installed from source here. Let me know if that works.
All the best, Liam
BTW, this was a very helpful report. Thank you.
DeleteHi Liam,
DeleteYes, I figured out my problem. I had given the argument types="discrete" instead of types=c("discrete","discrete"). After I fixed that Rthreshml would read in the output from threshml. However, I did notice that the covariance matrix attribute would be just NAs instead of the values printed from thresml. That is fixed in the version you just posted. Thanks!
Daniel
Yes! That is what I discovered too due to your bug report. I think it would be reasonable to allow users to specify types="discrete" or types="continuous" if all characters are of the same type. I will add that to the next version. Thanks! Liam
DeleteThis comment has been removed by the author.
ReplyDeleteHello! I'm analyzing my continuous-discrete data in Rthreshml. I've completed on analysis, but got an odd result: the coefficient obtained from the covariance matrix was -1.39201. I've never heard of these coefficients exceeding 1 or -1. I'm hoping that someone might be able to tell me what could be causing this. Other than increasing burnin and generations, I only changed the proposal size to 0.7 to get acceptance rates between 0.2-0.5.
ReplyDelete