## Thursday, February 28, 2013

### Function to count transitions from a mapped tree

To address a user request I just posted a simple function, countSimmap (code here), to count the number of transitions (in total and by type) on a discrete character mapped tree such as a stochastically mapped (i.e., SIMMAP) tree.

Here's a quick demo:
> require(phytools)
> source("utilities.R")
> Q<-matrix(c(-2,1,1,1,-2,1,1,1,-2),3,3)
> colnames(Q)<-rownames(Q)<-c("A","B","C")
> tree<-sim.history(pbtree(n=10,scale=1),Q)
> cols<-setNames(c("blue","red","green"),colnames(Q))
> par(col="white")
> plotTree(tree,ftype="i",lwd=5)
> par(col="black")
> countSimmap(tree,colnames(Q))
\$N
[1] 13

\$Tr
A B C
A 0 6 1
B 4 0 2
C 0 0 0

\$message
[1] "N is the total number of character changes on the tree"
[2] "Tr gives the number of transitions from row state->column state"

The function is even fast enough to run on quite large trees with lots of transitions, for instance:
> tree<-sim.history(pbtree(n=1000,scale=10),Q)
> system.time(XX<-countSimmap(tree,colnames(Q)))
user  system elapsed
0.22    0.00    0.22
> XX
\$N
[1] 3762

\$Tr
A   B   C
A   0 623 646
B 605   0 623
C 656 609   0

\$message
[1] "N is the total number of character changes on the tree"
[2] "Tr gives the number of transitions from row state->column state"

That's it.

1. Hi,

I've got a tree of 610 and trait data for each (1 or 0). After running SIMMAP, my number of transitions between 0 and 1 is 396160. This is obviously nonsense, but I don't know what is going on.

Best,
Jamie

1. Dear Jamie.

This sometimes happens because the default method of make.simmap is to first estimate a transition matrix (Q) using ML and then generate plausible character histories given that value of Q.

If your data have properties that cause an unrealistically large value of the transition rates between states to be estimated, then you will also find that your stochastic character histories will also have far too many transitions.

One property of data that can cause this is when you have two tips in the tree separated by a very short pair of branches that have different states. Under a constant-rate model, the only circumstance in which your data pattern can exist is if the rate of evolution is high.

There also exists a possibility that the failure is of make.simmap to find the ML value of Q. I posted an interesting case of this here: http://blog.phytools.org/2016/09/interesting-failure-to-find-ml_97.html.

If you send me an email (my address is very easy to find online) I can try to help you get to the bottom of which of the two possible problems you're experiencing.

All the best, Liam

2. Dear Liam,

Thanks. I emailed your address but not sure if you've seen it. I think the former is more likely, but I'm not sure if there is any way to solve it.

Jamie

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.