Wednesday, April 8, 2020

Fix for certain user-specified models in mcmcMk

In our second day of full lockdown mode here I found a bit of time to work on mcmcMk some more.

In particular, I resolved a problem in which, basically due to the different ways that geiger::fitDiscrete and phytools::fitMk keep track of the parameters of the model internally, the function was mis-recording the sampled parameter values in the MCMC for method="fitDiscrete" with user-specified custom models, under certain conditions.

Although the fix is a bit hard to demonstrate without rolling back phytools, I'll once again show the function working properly - for an assymetric, ordered model in which c \(\to\) b \(\to\) a.

library(phytools)
packageVersion("phytools")
## [1] '0.7.25'

Here is our tree & data:

tree
## 
## Phylogenetic tree with 400 tips and 399 internal nodes.
## 
## Tip labels:
##  t60, t61, t128, t174, t333, t334, ...
## 
## Rooted; includes branch lengths.
head(x,100)
##  t60  t61 t128 t174 t333 t334 t216 t232 t264 t265 t308 t309 t321 t322 t139  t82 
##    a    a    a    a    a    a    a    a    a    a    a    a    a    a    a    a 
##  t83  t70 t184 t185 t164  t33 t118 t213 t214 t301 t302 t239 t131 t355 t356 t293 
##    a    a    a    a    a    a    a    a    a    a    a    a    a    a    a    a 
## t325 t326 t140 t270 t271 t230 t231 t114  t31 t146 t147 t162 t163   t4  t27  t28 
##    a    a    a    a    a    a    a    a    a    a    a    a    a    a    a    a 
##  t48 t260 t261 t291 t292 t129 t130 t156 t157  t58  t79  t94 t362 t363 t103   t7 
##    a    a    a    a    a    a    a    a    a    a    a    a    a    a    a    a 
##  t34  t35  t24   t5  t75 t337 t338 t189  t20 t364 t365 t243 t198 t194  t10 t252 
##    a    a    a    a    a    a    a    a    a    a    a    a    a    a    a    a 
## t253 t246 t247 t116 t143 t150 t151 t117 t120 t283 t284  t96  t97  t54  t32  t39 
##    a    a    a    a    a    a    a    a    a    a    a    a    a    a    a    a 
## t211 t212  t66  t71 
##    a    a    a    a 
## Levels: a b c

Now, let's run the MCMC:

model<-matrix(c(0,0,0,
    1,0,0,
    0,2,0),3,3,byrow=TRUE,
    dimnames=list(levels(x),levels(x)))
model
##   a b c
## a 0 0 0
## b 1 0 0
## c 0 2 0
mcmc<-mcmcMk(tree,x,model=model,ngen=100000,
    plot=FALSE,print=1000,prior=list(alpha=0.1))
## Running MCMC....
## gen  [b,a]   [c,b]   logLik  accept
## 1    0.1125  0.1125  -209.8677
## 1001 1.0311  0.4811  -145.9574   0.19
## 2001 1.1372  0.4653  -146.0461   0.35
## 3001 1.1507  0.4421  -145.9537   0.45
## 4001 1.3569  0.4649  -147.3816   0.68
## 5001 0.894   0.3783  -146.0061   0.45
## 6001 1.0697  0.3242  -146.9457   0.48
## 7001 1.0788  0.3894  -145.8108   0.6
## 8001 0.5574  0.3414  -151.0613   0.45
## 9001 0.8516  0.3932  -146.0825   0.49
## 10001    1.1257  0.3349  -146.8359   0.46
## 11001    1.125   0.3725  -146.1307   0.49
## 12001    0.9755  0.382   -145.7842   0.5
## 13001    0.7686  0.4133  -146.6531   0.5
## 14001    0.9566  0.549   -147.149    0.47
## 15001    0.8269  0.3878  -146.2773   0.44
## 16001    0.9759  0.368   -145.9469   0.33
## 17001    0.8155  0.4867  -146.6895   0.66
## 18001    0.9744  0.6076  -148.6297   0.64
## 19001    1.1858  0.4542  -146.1755   0.58
## 20001    0.663   0.538   -149.399    0.53
## 21001    1.2204  0.3899  -146.4075   0.39
## 22001    0.8901  0.4911  -146.3072   0.47
## 23001    0.7234  0.3574  -147.6371   0.38
## 24001    0.915   0.3852  -145.8667   0.42
## 25001    0.9463  0.4324  -145.6515   0.54
## 26001    0.8786  0.3602  -146.3057   0.53
## 27001    1.0098  0.427   -145.5905   0.6
## 28001    0.9032  0.4038  -145.7886   0.44
## 29001    1.3653  0.2979  -149.6243   0.52
## 30001    0.8837  0.4566  -145.9711   0.47
## 31001    1.1031  0.4286  -145.7484   0.49
## 32001    1.1859  0.4662  -146.2656   0.57
## 33001    0.7933  0.3644  -146.7927   0.47
## 34001    1.1349  0.4643  -146.0297   0.56
## 35001    0.9552  0.3657  -146.0013   0.39
## 36001    1.0887  0.341   -146.5662   0.56
## 37001    1.1567  0.3648  -146.3707   0.41
## 38001    1.1454  0.4889  -146.3301   0.43
## 39001    0.6606  0.388   -148.2069   0.54
## 40001    1.0046  0.4476  -145.6578   0.43
## 41001    0.7817  0.4733  -146.8193   0.51
## 42001    0.9635  0.4448  -145.6694   0.46
## 43001    0.7509  0.5124  -147.6816   0.61
## 44001    1.2126  0.4545  -146.3171   0.56
## 45001    1.0634  0.3758  -145.9116   0.49
## 46001    1.0713  0.4183  -145.6652   0.33
## 47001    1.0059  0.4043  -145.6221   0.65
## 48001    1.2612  0.3505  -147.2243   0.51
## 49001    1.1928  0.4216  -146.1153   0.44
## 50001    0.9793  0.5629  -147.437    0.51
## 51001    0.6942  0.4221  -147.564    0.45
## 52001    1.1687  0.3685  -146.3727   0.6
## 53001    1.2241  0.4258  -146.2866   0.45
## 54001    1.2048  0.445   -146.2238   0.4
## 55001    1.1532  0.5256  -146.9353   0.4
## 56001    0.8538  0.39    -146.0917   0.38
## 57001    1.1687  0.4394  -146.0227   0.51
## 58001    1.1942  0.4186  -146.1252   0.52
## 59001    0.6644  0.3702  -148.3176   0.43
## 60001    0.9024  0.397   -145.8237   0.56
## 61001    1.1911  0.3693  -146.4755   0.45
## 62001    0.6954  0.3512  -148.1202   0.46
## 63001    0.6958  0.4844  -147.9735   0.34
## 64001    1.2458  0.303   -148.477    0.53
## 65001    0.7621  0.4542  -146.8423   0.52
## 66001    1.0386  0.3536  -146.2011   0.53
## 67001    0.9219  0.4694  -145.9404   0.66
## 68001    0.9958  0.3925  -145.6875   0.51
## 69001    1.1514  0.352   -146.5677   0.5
## 70001    1.139   0.513   -146.6593   0.47
## 71001    1.0985  0.3765  -145.9919   0.52
## 72001    1.1742  0.4161  -146.0295   0.48
## 73001    1.0808  0.4246  -145.6851   0.42
## 74001    0.9409  0.5518  -147.2428   0.46
## 75001    0.7539  0.3736  -147.056    0.44
## 76001    0.9674  0.3731  -145.8891   0.53
## 77001    1.3503  0.3619  -147.6933   0.56
## 78001    0.9867  0.3077  -147.3684   0.36
## 79001    0.9706  0.4634  -145.7865   0.55
## 80001    1.2299  0.4666  -146.5051   0.49
## 81001    1.0804  0.5015  -146.3037   0.53
## 82001    0.5921  0.3502  -150.0412   0.36
## 83001    0.86    0.3508  -146.5582   0.46
## 84001    0.9309  0.4542  -145.7886   0.5
## 85001    0.9123  0.4131  -145.7314   0.38
## 86001    0.978   0.4025  -145.6374   0.52
## 87001    1.3645  0.4176  -147.3003   0.57
## 88001    1.2006  0.379   -146.4025   0.54
## 89001    0.875   0.4887  -146.3441   0.53
## 90001    1.1098  0.5198  -146.6794   0.52
## 91001    0.9033  0.4016  -145.7971   0.52
## 92001    0.8386  0.3864  -146.2111   0.58
## 93001    1.1761  0.5445  -147.4077   0.46
## 94001    0.9196  0.3891  -145.8218   0.57
## 95001    1.0782  0.3479  -146.3884   0.51
## 96001    0.8498  0.3224  -147.2786   0.47
## 97001    0.7903  0.3388  -147.2852   0.55
## 98001    0.9393  0.3273  -146.811    0.43
## 99001    1.0759  0.4918  -146.1565   0.65
## Done.
mcmc
## 
## Posterior sample from mcmcMk consisting of a posterior sample obtained using
## Bayesian MCMC in the form of a matrix.
## 
## 1. plot('object_name') will create a likelihood profile plot.
## 2. summary('object_name') will compute a summary of the MCMC.
## 3. density('object_name') will calculate a posterior density from the sample.
## 4. Finally, plot(density('object_name')) will plot the posterior density and
##    and high probability density intervals.
## 
## To work best, we recommend users install the package 'coda'.
plot(mcmc)

plot of chunk unnamed-chunk-3

summary(mcmc)
## Assuming 20% burn-in as no burn-in was specified....
## 
## Mean value of Q from the post burn-in posterior sample:
##           a          b          c
## a 0.0000000  0.0000000  0.0000000
## b 0.9948892 -0.9948892  0.0000000
## c 0.0000000  0.4224619 -0.4224619
## 
## Median value of Q from the post burn-in posterior sample:
##           a          b        c
## a 0.0000000  0.0000000  0.00000
## b 0.9795823 -0.9795823  0.00000
## c 0.0000000  0.4183200 -0.41832
## 
## 95% HPD interval computed either from the post burn-in
## samples or using 'coda':
##           lower     upper
## [b,a] 0.6317635 1.3935374
## [c,b] 0.2804582 0.5710845

That's pretty cool, because the generating value of Q for these simulated data was as follows:

Q
##   a     b     c
## a 0  0.00  0.00
## b 1 -1.00  0.00
## c 0  0.25 -0.25

Finally, let's visualize the posterior density using density:

plot(density(mcmc,bw=0.03),xlim=c(0,3))
## Assuming 20% burn-in as no burn-in was specified....

plot of chunk unnamed-chunk-5

That's OK, right?

Finally, the data for this exercise were simulated as follows:

x<-sim.Mk(tree<-pbtree(n=400),
    Q<-matrix(c(0,0,0,1,-1,0,0,0.25,-0.25),3,3,byrow=TRUE,
    dimnames=list(letters[1:3],letters[1:3])),anc="c")

Thanks for reading!

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.