As I posted yesterday I've been
doing a little bit of work updating the phytools function threshBayes
.
threshBayes
uses Bayesian MCMC to fit a correlational threshold model to two traits: one continuous & one binary
trait; or two binary traits. In this way it is much more limited than Joe Felsenstein's
THRESHML program, which can be used to analyze an
arbitrary number of binary and continuous traits. (THRESHML can be called in R using
Rphylip, but it still needs to be installed locally.)
The key update, as I mentioned in my previous post, is tracking the acceptance rate for all the variables in the model, and then updating the proposal distributions progressively to try and hit a user-specified target rate that defaults to 0.234.
Given the very high parameter space of the model, the MCMC is still liable to be relatively inefficient. Nonetheless, this update seems to help a bit with convergence to the posterior distribution.
It is also relatively easy to show that threshBayes
actually works fairly well - that is, that when we have a
a pretty big tree (100 tips or more) we can usually recover the generating correlation coefficient of a simulation
for either two binary traits or one binary & one continuous trait most of the time.
Here's an example. Note that I'm going to simulate a generating correlation coefficient between the liabilities of my two binary traits of r = -0.8. Let's see if we can get that back!
tree<-pbtree(n=100)
tree
##
## Phylogenetic tree with 100 tips and 99 internal nodes.
##
## Tip labels:
## t73, t74, t95, t96, t40, t59, ...
##
## Rooted; includes branch lengths.
Y<-as.data.frame(sim.corrs(tree,
V<-matrix(c(
1.0,-0.8,
-0.8,1.0),
2,2)))
V
## [,1] [,2]
## [1,] 1.0 -0.8
## [2,] -0.8 1.0
Y[[1]]<-as.factor(threshState(Y[[1]],
setNames(c(0,Inf),c("a","b"))))
Y[[2]]<-as.factor(threshState(Y[[2]],
setNames(c(0,Inf),c("c","d"))))
Y
## V1 V2
## t73 a d
## t74 a d
## t95 b d
## t96 b d
## t40 a d
## t59 b c
## t60 b c
## t91 b c
## t92 b c
## t56 a d
## t57 a d
## t28 b c
## t31 b c
## t58 b c
## t87 b c
## t88 b c
## t43 a c
## t93 b c
## t94 b c
## t89 b c
## t90 b c
## t71 b c
## t72 a c
## t35 a d
## t44 a d
## t50 a d
## t52 a d
## t85 a d
## t86 a d
## t12 b c
## t45 b c
## t46 b c
## t11 a c
## t27 b c
## t54 a c
## t55 b c
## t62 b c
## t63 a c
## t37 b c
## t21 b c
## t79 b c
## t80 b c
## t19 b c
## t69 b c
## t70 b c
## t36 b c
## t9 a c
## t33 b c
## t34 b c
## t2 b d
## t38 b c
## t39 b c
## t67 b c
## t68 b c
## t53 b c
## t99 b c
## t100 b c
## t26 a d
## t81 a d
## t82 a d
## t5 b c
## t8 a d
## t97 a d
## t98 a d
## t29 b c
## t30 b c
## t23 b d
## t24 b d
## t25 a d
## t47 a d
## t75 a d
## t76 a d
## t6 b c
## t7 a d
## t13 a c
## t14 b c
## t64 a d
## t65 a c
## t66 a c
## t48 a c
## t4 a c
## t1 a c
## t3 a d
## t10 b c
## t32 b c
## t41 b c
## t42 b c
## t22 b c
## t77 a c
## t78 a c
## t61 a d
## t51 a c
## t20 a d
## t16 b c
## t83 a d
## t84 a d
## t49 a d
## t17 b d
## t18 a d
## t15 b c
Now, let's run our MCMC - for 10,000,000 generations (so I'm going to set plot=FALSE
):
mcmc<-threshBayes(tree,Y,ngen=10000000,plot=FALSE,
control=list(print.interval=100000))
## Starting MCMC....
## genearation: 100000; mean acceptance rate: 0.26
## genearation: 200000; mean acceptance rate: 0.31
## genearation: 300000; mean acceptance rate: 0.24
## genearation: 400000; mean acceptance rate: 0.29
## genearation: 500000; mean acceptance rate: 0.32
## genearation: 600000; mean acceptance rate: 0.24
## genearation: 700000; mean acceptance rate: 0.17
## genearation: 800000; mean acceptance rate: 0.23
## genearation: 900000; mean acceptance rate: 0.14
## genearation: 1000000; mean acceptance rate: 0.32
## genearation: 1100000; mean acceptance rate: 0.21
## genearation: 1200000; mean acceptance rate: 0.3
## genearation: 1300000; mean acceptance rate: 0.3
## genearation: 1400000; mean acceptance rate: 0.2
## genearation: 1500000; mean acceptance rate: 0.25
## genearation: 1600000; mean acceptance rate: 0.19
## genearation: 1700000; mean acceptance rate: 0.23
## genearation: 1800000; mean acceptance rate: 0.26
## genearation: 1900000; mean acceptance rate: 0.16
## genearation: 2000000; mean acceptance rate: 0.25
## genearation: 2100000; mean acceptance rate: 0.15
## genearation: 2200000; mean acceptance rate: 0.19
## genearation: 2300000; mean acceptance rate: 0.24
## genearation: 2400000; mean acceptance rate: 0.23
## genearation: 2500000; mean acceptance rate: 0.23
## genearation: 2600000; mean acceptance rate: 0.34
## genearation: 2700000; mean acceptance rate: 0.28
## genearation: 2800000; mean acceptance rate: 0.23
## genearation: 2900000; mean acceptance rate: 0.27
## genearation: 3000000; mean acceptance rate: 0.38
## genearation: 3100000; mean acceptance rate: 0.15
## genearation: 3200000; mean acceptance rate: 0.2
## genearation: 3300000; mean acceptance rate: 0.25
## genearation: 3400000; mean acceptance rate: 0.34
## genearation: 3500000; mean acceptance rate: 0.29
## genearation: 3600000; mean acceptance rate: 0.15
## genearation: 3700000; mean acceptance rate: 0.18
## genearation: 3800000; mean acceptance rate: 0.23
## genearation: 3900000; mean acceptance rate: 0.33
## genearation: 4000000; mean acceptance rate: 0.2
## genearation: 4100000; mean acceptance rate: 0.23
## genearation: 4200000; mean acceptance rate: 0.21
## genearation: 4300000; mean acceptance rate: 0.25
## genearation: 4400000; mean acceptance rate: 0.26
## genearation: 4500000; mean acceptance rate: 0.28
## genearation: 4600000; mean acceptance rate: 0.23
## genearation: 4700000; mean acceptance rate: 0.22
## genearation: 4800000; mean acceptance rate: 0.21
## genearation: 4900000; mean acceptance rate: 0.19
## genearation: 5000000; mean acceptance rate: 0.19
## genearation: 5100000; mean acceptance rate: 0.31
## genearation: 5200000; mean acceptance rate: 0.22
## genearation: 5300000; mean acceptance rate: 0.27
## genearation: 5400000; mean acceptance rate: 0.22
## genearation: 5500000; mean acceptance rate: 0.25
## genearation: 5600000; mean acceptance rate: 0.25
## genearation: 5700000; mean acceptance rate: 0.19
## genearation: 5800000; mean acceptance rate: 0.22
## genearation: 5900000; mean acceptance rate: 0.23
## genearation: 6000000; mean acceptance rate: 0.26
## genearation: 6100000; mean acceptance rate: 0.31
## genearation: 6200000; mean acceptance rate: 0.34
## genearation: 6300000; mean acceptance rate: 0.14
## genearation: 6400000; mean acceptance rate: 0.26
## genearation: 6500000; mean acceptance rate: 0.3
## genearation: 6600000; mean acceptance rate: 0.29
## genearation: 6700000; mean acceptance rate: 0.28
## genearation: 6800000; mean acceptance rate: 0.21
## genearation: 6900000; mean acceptance rate: 0.24
## genearation: 7000000; mean acceptance rate: 0.23
## genearation: 7100000; mean acceptance rate: 0.39
## genearation: 7200000; mean acceptance rate: 0.35
## genearation: 7300000; mean acceptance rate: 0.36
## genearation: 7400000; mean acceptance rate: 0.24
## genearation: 7500000; mean acceptance rate: 0.27
## genearation: 7600000; mean acceptance rate: 0.21
## genearation: 7700000; mean acceptance rate: 0.31
## genearation: 7800000; mean acceptance rate: 0.27
## genearation: 7900000; mean acceptance rate: 0.23
## genearation: 8000000; mean acceptance rate: 0.35
## genearation: 8100000; mean acceptance rate: 0.18
## genearation: 8200000; mean acceptance rate: 0.25
## genearation: 8300000; mean acceptance rate: 0.19
## genearation: 8400000; mean acceptance rate: 0.3
## genearation: 8500000; mean acceptance rate: 0.34
## genearation: 8600000; mean acceptance rate: 0.3
## genearation: 8700000; mean acceptance rate: 0.32
## genearation: 8800000; mean acceptance rate: 0.19
## genearation: 8900000; mean acceptance rate: 0.25
## genearation: 9000000; mean acceptance rate: 0.16
## genearation: 9100000; mean acceptance rate: 0.27
## genearation: 9200000; mean acceptance rate: 0.24
## genearation: 9300000; mean acceptance rate: 0.22
## genearation: 9400000; mean acceptance rate: 0.14
## genearation: 9500000; mean acceptance rate: 0.22
## genearation: 9600000; mean acceptance rate: 0.21
## genearation: 9700000; mean acceptance rate: 0.35
## genearation: 9800000; mean acceptance rate: 0.2
## genearation: 9900000; mean acceptance rate: 0.25
## genearation: 10000000; mean acceptance rate: 0.19
## Done MCMC.
mcmc
##
## Object of class "threshBayes" consisting of a matrix (L) of
## sampled liabilities for the tips of the tree & a second matrix
## (par) with the sample model parameters & correlation.
##
## Mean correlation (r) from the posterior sample is: -0.63687.
##
## Ordination of discrete traits:
##
## Trait 1: a <-> b
## Trait 2: c <-> d
plot(mcmc)
Lastly, let's visualize our posterior density for r. Remember, the simulated value was r = -0.8.
plot(density(mcmc))
Neat.
Hi Liam,
ReplyDeletethank you for the code and very helpful examples. The code itself seems to be not very complicated (for my data:w<-threshBayes(tree,data,types=c("disc","disc"),ngen=100000,control=list(sample=sample))), however, I think that I am making some mistake. All my results are 0, eg.
Starting MCMC....
genearation: 1000; mean acceptance rate: 0
genearation: 2000; mean acceptance rate: 0
genearation: 3000; mean acceptance rate: 0
And:
Mean correlation (r) from the posterior sample is: 0.
And
> effectiveSize(rA)
var1
0
> HPDinterval(rA)
lower upper
var1 0 0
attr(,"Probability")
[1] 0.95
Weird. I have seen that error when I fail to specify the 'types' of each character correctly (for instance, if one of the characters is a continuous character but I put types=c("disc","disc")), or, likewise, if I have missing data values in my input data.
DeleteThank you for your responce.
DeleteYes, it's weird. I have check the code with a different tree and different data (worked fine, with more "normal" values), but in the case of the dataset where I have 0, both the continous and discreat date give the same result. I don't have any missing values. The code also gives an error if there are any NAs.