Saturday, November 11, 2017

New methods & functionality in anc.Bayes for Bayesian ancestral state reconstruction with continuous characters

I have been frantically updating phytools recently (just check out my GitHub contributions over the past week) in an effort to get a new version on CRAN before the workshop I will be teaching in Concepción, Chile next week. This is a habit of mine. In fact, the most recent CRAN version of phytools went up on July 28 of this year, just before we taught a workshop in Córdoba, Argentina in August.

Some of the latest additions have been to include plot, print, or other handy S3 methods for older phytools functions lacking them. I did this, for instance, with ancThresh (for ancestral state estimation under the threshold model), with anc.trend (for ML ancestral state estimation with a trend), with phyl.cca (for phylogenetic canonical correlation analysis), with pgls.Ives (for phylogenetic regression with error in x & y), for ratebystate (for investigating the correlation between the state of one character and the evolutionary rate of a second), with fitBayes (for fitting continuous character models to data for individuals), to rerootingMethod (for ancestral state reconstruction using the re-rooting method), and to various other older phytools functions.

The latest victim is the function anc.Bayes which does Bayesian ancestral state reconstruction for continuous traits. This too is an old function. It will produce estimates that should be largely concordant with our MLEs if we use an uninformative prior; however, like any Bayesian method, it also lets us easily incorporate prior information about any of the variables of our model.

Here is a quick demo of the method on a relatively small dataset:

library(phytools)
packageVersion("phytools")
## [1] '0.6.44'
tree
## 
## Phylogenetic tree with 26 tips and 25 internal nodes.
## 
## Tip labels:
##  A, B, C, D, E, F, ...
## 
## Rooted; includes branch lengths.
x
##          A          B          C          D          E          F 
##  1.3603247  2.0493433  6.4178866  6.3543861  2.0243320  4.4218372 
##          G          H          I          J          K          L 
##  5.1615650  4.9953295  3.0671553  3.2513538 -0.7837053 -0.2470310 
##          M          N          O          P          Q          R 
## -2.2720883 -2.4991707 -1.7139029 -2.3435555 -0.8355605 -0.5544300 
##          S          T          U          V          W          X 
## -0.7852862 -0.3068102 -0.3196348  0.1156042 -0.2024103  0.6773816 
##          Y          Z 
##  0.9147725  1.7275359
obj<-anc.Bayes(tree,x,ngen=10000)
## Control parameters (set by user or default):
## List of 7
##  $ sig2   : num 1.57
##  $ a      : num [1, 1] 0.699
##  $ y      : num [1:24] 0.699 0.699 0.699 0.699 0.699 ...
##  $ pr.mean: num [1:26] 1000 0 0 0 0 0 0 0 0 0 ...
##  $ pr.var : num [1:26] 1e+06 1e+03 1e+03 1e+03 1e+03 1e+03 1e+03 1e+03 1e+03 1e+03 ...
##  $ prop   : num [1:26] 0.0542 0.0542 0.0542 0.0542 0.0542 ...
##  $ sample : num 100
## Starting MCMC...
## Done MCMC.

[Take note that anc.Bayes prints the prior means & variances that are being used. We may have to change these to keep them genuinely uninformative depending on the scale of our data!]

obj
## 
## Object of class "anc.Bayes" consisting of a posterior
##    sample from a Bayesian ancestral state analysis:
## 
## Mean ancestral states from posterior distribution:
##        27        28        29        30        31        32        33 
##  0.908381  0.461760  1.510746  1.555612  1.128039  1.609713  2.096415 
##        34        35        36        37        38        39        40 
##  2.180842  5.968319  2.486842  4.159537  4.967459  0.925745 -1.613286 
##        41        42        43        44        45        46        47 
## -2.049521 -2.027783 -2.284589  0.630461 -0.163160 -0.113382 -0.269287 
##        48        49        50        51 
## -0.522319 -0.090560  1.182425  0.894854 
## 
## Based on a burn-in of 2000 generations.

The plot method, by default, plots the likelihood profile of our MCMC:

plot(obj,lwd=3,cex.axis=0.7)

plot of chunk unnamed-chunk-3

The print method invisibly returns a vector of our estimates - so we can easily compare these to MLEs generated, say, using fastAnc:

a<-print(obj)
## 
## Object of class "anc.Bayes" consisting of a posterior
##    sample from a Bayesian ancestral state analysis:
## 
## Mean ancestral states from posterior distribution:
##        27        28        29        30        31        32        33 
##  0.908381  0.461760  1.510746  1.555612  1.128039  1.609713  2.096415 
##        34        35        36        37        38        39        40 
##  2.180842  5.968319  2.486842  4.159537  4.967459  0.925745 -1.613286 
##        41        42        43        44        45        46        47 
## -2.049521 -2.027783 -2.284589  0.630461 -0.163160 -0.113382 -0.269287 
##        48        49        50        51 
## -0.522319 -0.090560  1.182425  0.894854 
## 
## Based on a burn-in of 2000 generations.
plot(fastAnc(tree,x),a,xlab="MLE estimates (fastAnc)",
    ylab="Bayesian estimates (anc.Bayes)",
    pch=21,bg="grey",cex=1.5)

plot of chunk unnamed-chunk-4

Just for fun, let's run the MCMC for a really long time (OK, for a minute) and see if any incongruence between the two estimation procedures goes away:

plot(fastAnc(tree,x),print(anc.Bayes(tree,x,ngen=1000000)),
    xlab="MLE estimates (fastAnc)",
    ylab="Bayesian estimates (anc.Bayes)",
    pch=21,bg="grey",cex=1.5)
## Control parameters (set by user or default):
## List of 7
##  $ sig2   : num 1.57
##  $ a      : num [1, 1] 0.699
##  $ y      : num [1:24] 0.699 0.699 0.699 0.699 0.699 ...
##  $ pr.mean: num [1:26] 1000 0 0 0 0 0 0 0 0 0 ...
##  $ pr.var : num [1:26] 1e+06 1e+03 1e+03 1e+03 1e+03 1e+03 1e+03 1e+03 1e+03 1e+03 ...
##  $ prop   : num [1:26] 0.0542 0.0542 0.0542 0.0542 0.0542 ...
##  $ sample : num 100
## Starting MCMC...
## Done MCMC.
## 
## Object of class "anc.Bayes" consisting of a posterior
##    sample from a Bayesian ancestral state analysis:
## 
## Mean ancestral states from posterior distribution:
##        27        28        29        30        31        32        33 
##  0.537755  0.646746  1.651684  1.960614  1.729763  2.314067  3.454818 
##        34        35        36        37        38        39        40 
##  3.525240  5.990947  3.670800  4.375293  5.004054  0.364887 -1.486962 
##        41        42        43        44        45        46        47 
## -1.927833 -2.078280 -2.264523  0.495927 -0.267806 -0.339756 -0.332749 
##        48        49        50        51 
## -0.534411 -0.108647  1.227370  0.850702 
## 
## Based on a burn-in of 2e+05 generations.
lines(par()$usr[1:2],par()$usr[1:2],lwd=4,
    col=make.transparent("blue",0.2))

plot of chunk unnamed-chunk-5

Neat-o.

The data for this exercise were simulated as follows:

tree<-pbtree(n=26,tip.label=LETTERS)
x<-fastBM(tree)

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.