Monday, April 29, 2013

Stochastic mapping with a bad informative prior results in parsimony reconstruction

The title to this post pretty much says it all. As alluded to in the discussion thread of an earlier post, if you use an informative prior probability distribution in which the mean is too low - then the result can (especially in the limiting case as the mean goes to zero) become the parsimony reconstruction. That is, as our prior probability distribution forces our posterior distribution of Q to be closer & closer to zero, all reconstructions converge to the one or multiple most parsimonious reconstructions.

Here's a quick demonstration of what I mean. Just to make the point clear, I have used a very exaggerated bad prior probability density - the γ with α = 2 & β = 200 - which basically says we have a high degree of confidence that the rate of evolution is very low (with mean α/β = 0.01).

> library(phytools)
Attaching package: ‘phytools’

> # first create a mapped history
> Q<-matrix(c(-2,2,2,-2),2,2)
> colnames(Q)<-rownames(Q)<-LETTERS[1:2]
> tree<-sim.history(pbtree(n=100,scale=1),Q,anc="A")
> describe.simmap(tree)
1 tree with a mapped discrete character with states:
A, B

tree has 54 changes between states

changes are of the following types:
   A  B
A  0 18
B 36  0

mean total time spent in each state is:
             A         B    total
raw  12.897944 15.049629 27.94757
prop  0.461505  0.538495  1.00000

> x<-tree$states
> # now let's perform stochastic mapping with a
> # reasonable prior
> emp.prior<-make.simmap(tree,x,nsim=100,Q="mcmc", prior=list(beta=2,use.empirical=TRUE))
Running MCMC burn-in. Please wait....
Running 10000 generations of MCMC, sampling every 100 generations. Please wait....
make.simmap is simulating with a sample of Q from the posterior distribution
Mean Q from the posterior is
Q =
          A        B
A -1.943666  1.943666
B  1.943666 -1.943666
and (mean) root node prior probabilities
pi =
[1] 0.5 0.5
Done.

> # now a prior that is way too low!
> low.prior<-make.simmap(tree,x,nsim=100,Q="mcmc", prior=list(alpha=2,beta=200))
Running MCMC burn-in. Please wait....
Running 10000 generations of MCMC, sampling every 100 generations. Please wait....
make.simmap is simulating with a sample of Q from the posterior distribution
Mean Q from the posterior is
Q =
          A          B
A -0.1103515  0.1103515
B  0.1103515 -0.1103515
and (mean) root node prior probabilities
pi =
[1] 0.5 0.5
Done.

> # plot all three for comparison to the true tree
> layout(matrix(c(1,2,3),1,3))
> plotSimmap(tree,pts=F,lwd=2,ftype="off")
no colors provided. using the following legend:
      A      B
"black"  "red"
> plotSimmap(emp.prior[[1]],pts=F,lwd=2,ftype="off")
...
> plotSimmap(low.prior[[1]],pts=F,lwd=2,ftype="off")
...

What's dangerous about the reconstructions shown above is that (without the known true history, as in this case) it may be the rightmost reconstruction - the one obtained with the faulty prior - that looks most reasonable!

OK - let's do a little more analysis to further the point:

> describe.simmap(emp.prior)
100 trees with a mapped discrete character with states:
A, B

trees have 60.09 changes between states on average

changes are of the following types:
      A,B  B,A
x->y 29.52 30.57

mean total time spent in each state is:
              A          B    total
raw  15.7298487 12.2177248 27.94757
prop  0.5628341  0.4371659  1.00000

> describe.simmap(low.prior)
100 trees with a mapped discrete character with states:
A, B

trees have 22.97 changes between states on average

changes are of the following types:
      A,B  B,A
x->y 14.44 8.53

mean total time spent in each state is:
              A          B    total
raw  17.7439790 10.2035945 27.94757
prop  0.6349023  0.3650977  1.00000

> # get parsimony score
> library(phangorn)
> X<-phyDat(as.matrix(x),type="USER",levels=LETTERS[1:2])
> parsimony(tree,X)
[1] 22

Of course, we can push it even further & virtually guarantee that our stochastic maps have the MP number of changes, e.g.:

> really.low.prior<-make.simmap(tree,x,nsim=20,Q="mcmc", prior=list(alpha=1,beta=1e4))
Running MCMC burn-in. Please wait....
Running 2000 generations of MCMC, sampling every 100 generations. Please wait....
make.simmap is simulating with a sample of Q from the posterior distribution
Mean Q from the posterior is
Q =
            A            B
A -0.002198051  0.002198051
B  0.002198051 -0.002198051
and (mean) root node prior probabilities
pi =
[1] 0.5 0.5
Done.
> describe.simmap(really.low.prior)
20 trees with a mapped discrete character with states:
A, B

trees have 22.05 changes between states on average

changes are of the following types:
      A,B B,A
x->y 13.65 8.4

mean total time spent in each state is:
              A          B    total
raw  17.4916200 10.4559534 27.94757
prop  0.6258726  0.3741274  1.00000

Well, you get the idea.

1 comment:

  1. hi Liam!

    So I've been playing around with this again, and I'm comparing the results obtained using the empirical bayesian approach and the full MCMC approach. I'm kinda surprised that the results are so different, and wondering if this is expected.

    My tree has 47 tips, 7 have state 0 and 40 state 1. The seven tips with state 0 are quite sparsely distributed throughout the phylogeny.

    here's an example of what I get with the two approaches:

    > describe.simmap(make.simmap(tree, setNames(x[,1], row.names(x)), model='ARD', pi='estimated'))
    make.simmap is sampling character histories conditioned on the transition matrix
    Q =
    0 1
    0 -22.794297 22.794297
    1 4.089519 -4.089519
    (estimated using likelihood);
    and (mean) root node prior probabilities
    pi =
    0 1
    0.1521182 0.8478818
    Done.
    1 tree with a mapped discrete character with states:
    0, 1

    tree has 153 changes between states

    changes are of the following types:
    0 1
    0 0 78
    1 75 0

    mean total time spent in each state is:
    0 1 total
    raw 3.9761632 17.6508515 21.62701
    prop 0.1838517 0.8161483 1.00000


    > describe.simmap(make.simmap(tree, setNames(x[,1], row.names(x)), model='ARD', Q='mcmc', pi='estimated'))
    Running MCMC burn-in. Please wait....
    Running 100 generations of MCMC, sampling every 100 generations. Please wait....
    make.simmap is simulating with a sample of Q from the posterior distribution
    Mean Q from the posterior is
    Q =
    0 1
    0 -3.221033 3.221033
    1 0.596259 -0.596259
    and (mean) root node prior probabilities
    pi =
    0 1
    0.1561995 0.8438005
    Done.
    1 tree with a mapped discrete character with states:
    1

    tree has 15 changes between states

    changes are of the following types:
    0 1
    0 0 4
    1 11 0

    mean total time spent in each state is:
    0 1 total
    raw 1.70813038 19.9188843 21.62701
    prop 0.07898133 0.9210187 1.00000

    I'm kinda surprised there is an about tenfold difference in the estimated transitions between the characters. Of course intuitively it feels that less transitions make more sense, but as you explore in this post, that needn't be the case. But the difference between the transition matrices is quite huge too. Is it because the default priors might not be reasonable for this case?

    Thanks for any help, cheers!

    ReplyDelete