The title to this post pretty much says it all. As alluded to in the discussion thread of an earlier post, if you use an informative prior probability distribution in which the mean is too low - then the result can (especially in the limiting case as the mean goes to zero) become the parsimony reconstruction. That is, as our prior probability distribution forces our posterior distribution of ** Q** to be closer & closer to zero, all reconstructions converge to the one or multiple most parsimonious reconstructions.

Here's a quick demonstration of what I mean. Just to make the point clear, I have used a very exaggerated bad prior probability density - the γ with α = 2 & β = 200 - which basically says we have a high degree of confidence that the rate of evolution is very low (with mean α/β = 0.01).

Attaching package: ‘phytools’

> # first create a mapped history

> Q<-matrix(c(-2,2,2,-2),2,2)

> colnames(Q)<-rownames(Q)<-LETTERS[1:2]

> tree<-sim.history(pbtree(n=100,scale=1),Q,anc="A")

> describe.simmap(tree)

1 tree with a mapped discrete character with states:

A, B

tree has 54 changes between states

changes are of the following types:

A B

A 0 18

B 36 0

mean total time spent in each state is:

A B total

raw 12.897944 15.049629 27.94757

prop 0.461505 0.538495 1.00000

> x<-tree$states

> # now let's perform stochastic mapping with a

> # reasonable prior

> emp.prior<-make.simmap(tree,x,nsim=100,Q="mcmc", prior=list(beta=2,use.empirical=TRUE))

Running MCMC burn-in. Please wait....

Running 10000 generations of MCMC, sampling every 100 generations. Please wait....

make.simmap is simulating with a sample of Q from the posterior distribution

Mean Q from the posterior is

Q =

A B

A -1.943666 1.943666

B 1.943666 -1.943666

and (mean) root node prior probabilities

pi =

[1] 0.5 0.5

Done.

> # now a prior that is way too low!

> low.prior<-make.simmap(tree,x,nsim=100,Q="mcmc", prior=list(alpha=2,beta=200))

Running MCMC burn-in. Please wait....

Running 10000 generations of MCMC, sampling every 100 generations. Please wait....

make.simmap is simulating with a sample of Q from the posterior distribution

Mean Q from the posterior is

Q =

A B

A -0.1103515 0.1103515

B 0.1103515 -0.1103515

and (mean) root node prior probabilities

pi =

[1] 0.5 0.5

Done.

> # plot all three for comparison to the true tree

> layout(matrix(c(1,2,3),1,3))

> plotSimmap(tree,pts=F,lwd=2,ftype="off")

no colors provided. using the following legend:

A B

"black" "red"

> plotSimmap(emp.prior[[1]],pts=F,lwd=2,ftype="off")

...

> plotSimmap(low.prior[[1]],pts=F,lwd=2,ftype="off")

...

What's dangerous about the reconstructions shown above is that (without the known true history, as in this case) it may be the *rightmost* reconstruction - the one obtained with the faulty prior - that looks most reasonable!

OK - let's do a little more analysis to further the point:

100 trees with a mapped discrete character with states:

A, B

trees have 60.09 changes between states on average

changes are of the following types:

A,B B,A

x->y 29.52 30.57

mean total time spent in each state is:

A B total

raw 15.7298487 12.2177248 27.94757

prop 0.5628341 0.4371659 1.00000

> describe.simmap(low.prior)

100 trees with a mapped discrete character with states:

A, B

trees have 22.97 changes between states on average

changes are of the following types:

A,B B,A

x->y 14.44 8.53

mean total time spent in each state is:

A B total

raw 17.7439790 10.2035945 27.94757

prop 0.6349023 0.3650977 1.00000

> # get parsimony score

> library(phangorn)

> X<-phyDat(as.matrix(x),type="USER",levels=LETTERS[1:2])

> parsimony(tree,X)

[1] 22

Of course, we can push it even further & virtually *guarantee* that our stochastic maps have the MP number of changes, e.g.:

Running MCMC burn-in. Please wait....

Running 2000 generations of MCMC, sampling every 100 generations. Please wait....

make.simmap is simulating with a sample of Q from the posterior distribution

Mean Q from the posterior is

Q =

A B

A -0.002198051 0.002198051

B 0.002198051 -0.002198051

and (mean) root node prior probabilities

pi =

[1] 0.5 0.5

Done.

> describe.simmap(really.low.prior)

20 trees with a mapped discrete character with states:

A, B

trees have 22.05 changes between states on average

changes are of the following types:

A,B B,A

x->y 13.65 8.4

mean total time spent in each state is:

A B total

raw 17.4916200 10.4559534 27.94757

prop 0.6258726 0.3741274 1.00000

Well, you get the idea.

hi Liam!

ReplyDeleteSo I've been playing around with this again, and I'm comparing the results obtained using the empirical bayesian approach and the full MCMC approach. I'm kinda surprised that the results are so different, and wondering if this is expected.

My tree has 47 tips, 7 have state 0 and 40 state 1. The seven tips with state 0 are quite sparsely distributed throughout the phylogeny.

here's an example of what I get with the two approaches:

> describe.simmap(make.simmap(tree, setNames(x[,1], row.names(x)), model='ARD', pi='estimated'))

make.simmap is sampling character histories conditioned on the transition matrix

Q =

0 1

0 -22.794297 22.794297

1 4.089519 -4.089519

(estimated using likelihood);

and (mean) root node prior probabilities

pi =

0 1

0.1521182 0.8478818

Done.

1 tree with a mapped discrete character with states:

0, 1

tree has 153 changes between states

changes are of the following types:

0 1

0 0 78

1 75 0

mean total time spent in each state is:

0 1 total

raw 3.9761632 17.6508515 21.62701

prop 0.1838517 0.8161483 1.00000

> describe.simmap(make.simmap(tree, setNames(x[,1], row.names(x)), model='ARD', Q='mcmc', pi='estimated'))

Running MCMC burn-in. Please wait....

Running 100 generations of MCMC, sampling every 100 generations. Please wait....

make.simmap is simulating with a sample of Q from the posterior distribution

Mean Q from the posterior is

Q =

0 1

0 -3.221033 3.221033

1 0.596259 -0.596259

and (mean) root node prior probabilities

pi =

0 1

0.1561995 0.8438005

Done.

1 tree with a mapped discrete character with states:

1

tree has 15 changes between states

changes are of the following types:

0 1

0 0 4

1 11 0

mean total time spent in each state is:

0 1 total

raw 1.70813038 19.9188843 21.62701

prop 0.07898133 0.9210187 1.00000

I'm kinda surprised there is an about tenfold difference in the estimated transitions between the characters. Of course intuitively it feels that less transitions make more sense, but as you explore in this post, that needn't be the case. But the difference between the transition matrices is quite huge too. Is it because the default priors might not be reasonable for this case?

Thanks for any help, cheers!