Monday, December 4, 2017

More on visualizing the rate of discrete evolution through time

Last week I posted on visualizing the mean number of changes (or changes per unit of edge length) per interval of time from the root of the tree to the present day. This can create a nice plot; however it is more or less meaningless unless we can compare it to what we might expect under some kind of neutral process.

Consequently, today I have added some new function & methods that:

1) Automate calculation of this CTT or 'changes-through-time' plot based on stochastic mapped trees.

2) Simulate CTT plots for a given transition matrix, Q and (optionally) ancestral state, by first simulating a discrete character & then sampling stochastic character maps consistent with this character. This can be done a large number of times to generate a null distribution of the accumulation of changes through time under the model.

3) Visualize our empirical CTT plot and a 100×(1-α)% distribution of CTTs under the a null model.

Here's what that looks like:

library(phytools)
packageVersion("phytools")
## [1] '0.6.49'
tree
## 
## Phylogenetic tree with 200 tips and 199 internal nodes.
## 
## Tip labels:
##  t5, t29, t68, t199, t200, t11, ...
## 
## Rooted; includes branch lengths.
y
##   t5  t29  t68 t199 t200  t11  t18 t107 t108  t72   t3  t24  t85  t86  t44 
##    b    c    c    c    c    c    c    c    c    c    c    c    c    c    b 
##  t48  t51  t52 t186 t195 t196 t115 t116  t91 t134 t135 t113  t42  t90  t97 
##    b    b    b    b    b    b    c    c    c    c    c    c    c    a    a 
##  t98 t158 t159  t50 t144 t145  t32  t43 t151 t152 t112  t30  t53  t54  t47 
##    b    a    a    a    a    a    a    a    a    a    a    a    a    a    a 
##  t82  t83  t57  t58 t157 t176 t177  t76  t75  t92  t93  t95 t139 t140 t153 
##    a    a    a    a    a    a    a    a    a    a    a    a    a    a    a 
## t154  t36  t37  t80 t101 t102  t73  t74  t31 t124 t125  t33 t162 t163  t96 
##    a    a    a    a    a    a    a    a    a    a    a    a    a    a    a 
## t114 t122 t123 t132 t133  t78  t79  t71  t17  t21  t22  t14  t25  t26  t89 
##    a    a    a    a    a    b    a    a    a    a    a    a    a    a    a 
## t187 t188 t170 t171  t23 t172 t173  t84 t148 t149 t156 t182 t183  t81 t197 
##    a    a    a    a    a    c    c    a    a    a    c    c    c    c    c 
## t198 t141  t49 t189 t190  t19  t20 t184 t185 t155 t110 t111 t106 t117 t118 
##    c    c    c    c    c    c    c    c    c    c    c    c    c    c    c 
##  t13   t4  t38  t39  t12 t150 t180 t181  t45  t46   t8 t142 t143  t40 t164 
##    b    c    c    c    c    c    c    c    c    c    c    c    c    c    c 
## t165 t178 t179  t55  t56  t41  t59  t87  t88  t35 t193 t194 t138 t174 t175 
##    c    c    c    c    c    c    a    a    a    a    a    a    c    c    c 
##  t15   t9 t128 t129 t160 t161 t119  t62  t63  t10   t2 t191 t192  t69  t70 
##    c    a    a    a    a    a    a    a    a    a    a    b    b    b    b 
## t146 t147 t103  t77 t109 t168 t169 t136 t137 t130 t131 t120 t121 t126 t127 
##    b    b    b    b    b    b    b    b    b    b    b    b    b    b    b 
## t166 t167   t7  t64  t65  t34  t16  t27  t66  t67  t99 t100  t60  t61 t104 
##    b    b    b    a    a    a    b    b    b    b    b    b    b    b    b 
## t105  t94  t28   t6   t1 
##    b    b    b    b    b 
## Levels: a b c
trees<-make.simmap(tree,y,nsim=100,model="ER")
## make.simmap is sampling character histories conditioned on the transition matrix
## 
## Q =
##              a            b            c
## a -0.003042964  0.001521482  0.001521482
## b  0.001521482 -0.003042964  0.001521482
## c  0.001521482  0.001521482 -0.003042964
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
##         a         b         c 
## 0.3333333 0.3333333 0.3333333
## Done.
object<-ctt(trees)
object
## Object of class "ctt" consisting of:
##    (1) a matrix (segments) with the beginning & ending time of each segment.
##    (2) a vector (nchanges) with the mean number of changes in each segment.
##    (3) a vector (edge.length) containing the total edge length of each segement.
##    (4) an object of class "phylo".
plot(object,type="number")

plot of chunk unnamed-chunk-1

More often it seems more likely that we'd be interested in tracking the number of changes per unit of edge length rather than the total number of changes - because in all reconstructed phylogeny of extant taxa there is more edge length towards the tips of the tree than towards the root.

plot(object,type="rate")

plot of chunk unnamed-chunk-2

Of course, as I mentioned in the preamble, it probably doesn't make much sense to generate this type of visualization unless we have a sense of what might be expected under some kind of reasonable null hypothesis - such as a constant rate of character evolution through time & among lineages. Luckily, we can simulate this too. The function sim.ctt will simulate a CTT plot for a given transition matrix, Q, by first simulating a discrete character and then by sampling stochastic character maps for that character. This is still somewhat slow, so it may take a few minutes to run.

I'll use the function sim.multiCtt to simulate various rather than a single CTT:

Q<-trees[[1]]$Q
Q
##              a            b            c
## a -0.003042964  0.001521482  0.001521482
## b  0.001521482 -0.003042964  0.001521482
## c  0.001521482  0.001521482 -0.003042964
nulo<-sim.multiCtt(tree,Q,nsim=100)
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
## Starting stochastic mapping with simulated data vector.... Done.
nulo
## 100 objects of class "ctt" in a list.
plot(nulo,type="number")
plot(object,add=TRUE,type="number")

plot of chunk unnamed-chunk-3

That's kind of neat. Now, for the rate. Here, I will change the α level for our confidence interval.

plot(nulo,alpha=0.2,ylim=c(0,0.025))
plot(object,add=TRUE)

plot of chunk unnamed-chunk-4

In this case, the data were simulated under a process of declining rate towards the present. Perhaps that was captured by our plots?

tree<-pbtree(n=200,scale=100)
Q<-matrix(c(-0.004,0.002,0.002,
    0.002,-0.004,0.002,
    0.002,0.002,-0.004),3,3)
rownames(Q)<-colnames(Q)<-letters[1:3]
EB<-phytools:::ebTree(tree,-0.06)
EB$edge.length<-EB$edge.length/max(nodeHeights(EB))*100
y<-as.factor(sim.history(EB,10*Q)$states)

No comments:

Post a Comment