Phylogenetic Tools for Comparative Biology: More performance testing on make.simmap

Wednesday, April 24, 2013

More performance testing on make.simmap

I wanted to repeat & elaborate some performance testing that I conducted earlier in the month. This is partly motivated by the fact that I introduced and then fixed a bug in the function since conducting this test; and partly this is driven by some efforts to figure out why the stand-alone program SIMMAP can give deceptively misleading results when the default priors are used (probably because the default priors should not be used - more on this later).

My simulation test was as follows:

1. I simulated a stochastic phylogeny (using pbtree) and character history for a binary trait with states a and b (using sim.history) conditioned on a fixed, symmetric transition matrix Q. For these analyses, I simulated a tree with 100 taxa of total length 1.0 with backward & forward rate of 1.0.

2. I counted the true number of transitions from a to b & b to a; the true total number of changes; and the true fraction of time spent in state a (for any tree, the time spent in b will just be 1.0 minus this).

3. I generated 200 stochastic character maps using make.simmap. For more information about stochastic mapping or make.simmap search my blog & look at appropriate references.

4. From the stochastic maps, I computed the mean number of transitions of each type; the mean total number of changes; and the mean time spent in state a. I evaluated whether the 95% CI for each of these variables included the true values from 1.

5. I repeated 1. through 4. 200 times.

First, we can ask how often the 95% CI includes the generating values. Ideally, this should be 0.95 of the time:

> colMeans(on95)
a,b b,a N time.a
0.905 0.895 0.790 0.950

This is not too bad.

Now let's attach some visuals to parameter estimation. Figure 1 shows scatterplots of the relationship between the true and estimated (averaged across 200 stochastic maps) values of each of the four metrics described above:

Figure 1:

Figure 2 is a clearer picture of bias. It gives the distribution of Y - Ŷ, where Ŷ is just the mean from the stochastic maps for a given tree & dataset. The vertical dashed line gives the expectation (zero) if our method is unbiased; whereas the vertical solid line is the mean of our sample.

Figure 2:

That's it. I'm not posting the code - because there's a whole bunch of it; however if you want it, please just let me know.

7 comments:

Rafael MaiaApril 25, 2013 at 12:53 PM
Hi Liam! This is all very exciting, and I can't wait to hear more about it as you compare the two procedures, as it is something I've been dealing with in a way for my data.

One thing I noticed using this approach for my data is that SIMMAP tends to generate maps that have considerably less transitions, especially for rare tip states. When I use make.simmap for the same data, I tend to get far more transitions (for one dataset, I remember getting like 10 times more frequent shifts for one transition type, even when using a non-symmetrical transition matrix), and many more of those "within-branch transition sandwiches" (you know? when in a single branch, the map will shift from state 0 to state 1, then back to state 0).

Since these are all in the case of empirical data, I don't know the true transition probabilities and thus it becomes hard to evaluate which procedure is returning something closer to the underlying real model. I tend to find that SIMMAP produces results that are more intuitive and with fewer of the above mentioned (and ridiculously named) "sandwiches". In any case, I thought I would share my case; I wonder if this is something you've ran into before and what are your thoughts.

Cheers!
ReplyDelete
Replies
NitecruzrApril 26, 2013 at 2:02 PM
Here's a test comment.

See:

http://productforums.google.com/d/topic/blogger/oXxuvPSFnpo/discussion
ReplyDelete
Replies
UnknownMarch 28, 2017 at 9:53 AM
Thanks Admin for sharing such a useful post, I hope it’s useful to many individuals for whose looking this precious information to developing their skill.
Regards,
Best software testing training institute in chennai|Software testing courses in chennai|testing training chennai
ReplyDelete
Replies

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.

Pages

Wednesday, April 24, 2013

More performance testing on make.simmap

7 comments: