The phytools function rateshift
fits a model with one or more temporal rate shifts that affect all the branches of the tree, identifying the location of these temporal shifts via Maximum Likelihood.
This method is described in my book with Luke Harmon, as well as in a number of prior posts on this blog over the years (e.g., 1, 2, 3).
A colleague recently shared an empirical example that he thought be experiencing issues with convergence (to the ML solution), but that we actually realized was an example of non-identifiability. Non-identifiability means that multiple sets of parameter values (or, equivalently, parameterizations) can lead to exactly the same maximum likelihood. Though non-identifiability had not previously been characterized for this particular model, it doesn’t surprise me. Indeed, it’s pretty easy to see for certain trivial cases how a rate shift model could be non-identifiable for the position of the shift and the rates of evolution, \(\sigma_{1}^2\) and \(\sigma_{2}^2\). For instance, if the shift occurs after the most recent branching time in the tree (that is, along a terminal edge for all tips) than there must be an infinite number of different combinations of rates and shift points that could explain our data with the same probability.
In practice, identifiability will depend on the specific data & tree, as well as the number of shifts in our model – but from a practical standpoint I might predict that non-identifiability is likely to be more common for smaller trees and more rate shifts.
There’s no way to solve this problem – however, I have updated phytools on GitHub so that it can be more easily seen. In particular, I made two significant changes. (1) rateshift
runs multiple optimization iterations by default (and the model can be hard to optimize even when fully identifiable, so this is highly recommended). Previously, only the best of these optimization was retained. Now the results from all optimization iterations are kept and returned to the user as part of the model object. (2) rateshift
previously assessed the frequency of convergence to the same best fit by measuring the number of optimization iterations with the same (best) likelihood score. This inadvertently assumes identifiability, which we can see is not guaranteed. Now rateshift
not only assesses whether or not the same likelihood is obtained, but also whether or not the parameters of the fitted model are equal (to a reasonable machine precision).
To illustrate this, I’m going to use a simulated tree 100 units of depth and 100 taxa, with rate shifts in a quantitative character at 50, 70, and 90 time units above the root. The rate of evolution on these intervals (i.e., from 0 to 50, 50 to 70, etc.) will shift back and forth between 20 and 1.0, such that \(\sigma_{1}^2 = \sigma_{3}^2 = 20\) and \(\sigma_{2}^2 = \sigma_{4}^2 = 1.0\).
Here’s my simulation.
library(phytools)
packageVersion("phytools")
## [1] '2.3.15'
tree<-pbtree(n=100,scale=100)
tt<-make.era.map(tree,setNames(c(0,50,70,90),1:4))
cols<-setNames(palette()[1:4],1:4)
plot(ladderize.simmap(tt),cols,ftype="off")
legend("bottomleft",
legend=c(expression(paste(sigma^2,"= 20")),
expression(paste(sigma^2,"= 1")),
expression(paste(sigma^2,"= 20")),
expression(paste(sigma^2,"= 1"))),
col=cols,bty="n",cex=0.8,lwd=2)
sig2<-setNames(c(20,1,20,1),1:4)
x<-sim.rates(tt,sig2)
Now let’s fit our one-rate, two-rate, three-rate, and four-rate models.
fit1<-rateshift(tree,x,parallel=TRUE)
## Opened cluster with 10 cores.
## Running optimization iterations in parallel.
## Please wait....
## Done.
This is a simple one-rate (i.e., no shift) Brownian model, so the model fit & parameters should closely match what we might obtain using, say, geiger::fitContinuous
for the same model.
fit1
## ML 1-rate model:
## s^2(1) se(1) k logL
## value 6.413 0.907 2 -380.3
##
## This is a one-rate model.
##
## Model fit using ML.
##
## Frequency of best fit: 1
##
## Optimization may not have converged.
We did (almost certainly unnecessarily) run 10 optimization iterations. To look at all 10 of them, we merely need to print fit1$all.fits
as follows.
print(fit1$all.fits,digits=6)
## sig2(1) log(L)
## 1 6.41312 -380.338
## 2 6.41312 -380.338
## 3 6.41312 -380.338
## 4 6.41312 -380.338
## 5 6.41312 -380.338
## 6 6.41312 -380.338
## 7 6.41312 -380.338
## 8 6.41312 -380.338
## 9 6.41312 -380.338
## 10 6.41312 -380.338
Next, let’s fit our two-rate (i.e., one-shift model). This model seems to be typically identifiable (except in some trivial cases).
fit2<-rateshift(tree,x,nrates=2,parallel=TRUE)
## Opened cluster with 10 cores.
## Running optimization iterations in parallel.
## Please wait....
## Done.
Once again, we’ve run 10 optimization iterations, in parallel. We can print the best of these as follows.
fit2
## ML 2-rate model:
## s^2(1) se(1) s^2(2) se(2) k logL
## value 15.71 3.008 0.7328 0.164 4 -348.4
##
## Shift point(s) between regimes (height above root):
## 1|2 se(1|2)
## value 89.35 0.05292
##
## Model fit using ML.
##
## Frequency of best fit: 0.4
##
## R thinks it has found the ML solution.
We can also look at all 10, just as we did before.
print(fit2$all.fits,digits=6)
## 1<->2 sig2(1) sig2(2) log(L)
## 4 89.35212 15.7140 0.732815 -348.419
## 7 89.35212 15.7140 0.732815 -348.419
## 6 89.35211 15.7141 0.732815 -348.419
## 3 89.35216 15.7140 0.732815 -348.419
## 5 90.13193 14.6454 0.706147 -348.492
## 10 52.53078 35.4834 4.573128 -372.138
## 1 52.53169 35.4818 4.573120 -372.138
## 8 52.50331 35.5323 4.573357 -372.138
## 9 52.49408 35.5488 4.573434 -372.138
## 2 2.70186 272.8927 5.999341 -378.500
Next, let’s fit a three-rate (two shift) model by setting nrates=3
. This time, we’ll increase our optimization iterations to niter=40
.
fit3<-rateshift(tree,x,nrates=3,parallel=TRUE,niter=40)
## Opened cluster with 15 cores.
## Running optimization iterations in parallel.
## Please wait....
## Done.
fit3
## ML 3-rate model:
## s^2(1) se(1) s^2(2) se(2) s^2(3) se(3) k logL
## value 34.28 22.2 12.42 2.683 0.6886 0.1625 6 -347.4
##
## Shift point(s) between regimes (height above root):
## 1|2 se(1|2) 2|3 se(2|3)
## value 44.4 0.4432 90.54 0.7465
##
## Model fit using ML.
##
## Frequency of best fit: 0.05
##
## R thinks it has found the ML solution.
print(fit3$all.fits,digits=6)
## 1<->2 2<->3 sig2(1) sig2(2) sig2(3) log(L)
## 22 44.40279 90.5415 34.27730 1.24198e+01 0.688608 -347.404
## 6 44.40270 90.5404 34.27969 1.24214e+01 0.688657 -347.404
## 15 44.40238 90.5417 34.28125 1.24195e+01 0.688639 -347.404
## 8 44.40317 90.5423 34.27565 1.24180e+01 0.688566 -347.404
## 39 44.40369 90.5420 34.27685 1.24192e+01 0.688592 -347.404
## 37 44.40359 90.5427 34.27762 1.24181e+01 0.688562 -347.404
## 7 44.40197 90.5425 34.27598 1.24171e+01 0.688568 -347.404
## 26 44.40189 90.5404 34.30484 1.24190e+01 0.688641 -347.404
## 18 44.40134 90.5412 34.28298 1.24201e+01 0.688603 -347.404
## 10 44.40436 90.5433 34.25980 1.24179e+01 0.688557 -347.404
## 33 44.40451 90.5430 34.30243 1.24185e+01 0.688549 -347.404
## 13 44.40262 90.5371 34.27460 1.24257e+01 0.688784 -347.404
## 17 44.40781 90.5384 34.27022 1.24240e+01 0.688725 -347.404
## 27 44.39678 90.5440 34.28762 1.24153e+01 0.688459 -347.404
## 36 44.39422 90.5415 34.28011 1.24204e+01 0.688590 -347.404
## 16 19.85439 90.5148 127.39826 1.27072e+01 0.688729 -347.503
## 14 19.85444 90.5127 127.40406 1.27092e+01 0.688837 -347.503
## 2 19.85452 90.5144 127.36118 1.27076e+01 0.688752 -347.503
## 30 19.85446 90.5104 127.32434 1.27132e+01 0.688918 -347.503
## 11 19.85413 90.5259 127.58445 1.26925e+01 0.688268 -347.503
## 1 19.85442 90.5004 127.16962 1.27263e+01 0.689362 -347.503
## 19 90.57304 98.3022 14.26742 3.60017e-01 0.955852 -347.573
## 31 44.39361 89.3516 33.51793 1.41072e+01 0.738306 -347.647
## 28 4.65485 90.4419 112.88488 1.37823e+01 0.689071 -348.229
## 9 9.01433 90.4376 64.94502 1.37876e+01 0.689332 -348.229
## 29 85.31761 89.0376 14.63693 1.99498e+01 0.730945 -348.338
## 35 85.31819 89.0701 14.65244 1.96871e+01 0.730727 -348.338
## 4 78.81765 90.4886 16.32794 1.27410e+01 0.688944 -348.381
## 40 46.89670 47.0559 31.13711 2.25436e+03 4.508658 -371.464
## 24 53.94660 64.8511 35.58749 6.41312e-04 4.602282 -371.772
## 25 53.94284 64.8512 35.59533 6.41312e-04 4.602306 -371.772
## 12 53.88228 64.8513 35.72194 6.41312e-04 4.602670 -371.772
## 5 13.49694 51.0415 1.35391 4.30970e+01 4.574627 -372.052
## 38 13.91676 51.0459 2.62926 4.30805e+01 4.574617 -372.052
## 21 13.52555 51.0500 1.47633 4.30705e+01 4.574596 -372.052
## 20 13.89952 51.0514 2.60380 4.30612e+01 4.574554 -372.052
## 34 14.30243 51.0520 3.74658 4.30573e+01 4.574542 -372.052
## 3 33.98171 49.9846 24.28969 5.46582e+01 4.574736 -372.094
## 23 33.98210 49.9785 24.26218 5.47138e+01 4.574739 -372.094
## 32 33.99019 49.9858 24.30057 5.46518e+01 4.574719 -372.094
Finally, let’s fit a four-rate (three shift) model. This is our generating model, recall.
fit4<-rateshift(tree,x,nrates=4,parallel=TRUE,niter=40)
## Opened cluster with 15 cores.
## Running optimization iterations in parallel.
## Please wait....
## Done.
fit4
## ML 4-rate model:
## s^2(1) se(1) s^2(2) se(2) s^2(3) se(3) s^2(4) se(4) k logL
## value 36.47 22.85 0.0006 NaN 17.08 5.634 0.7314 0.1634 8 -346.1
##
## Shift point(s) between regimes (height above root):
## 1|2 se(1|2) 2|3 se(2|3) 3|4 se(3|4)
## value 50.96 2.908 73.58 7.417 89.35 0.06403
##
## Model fit using ML.
##
## Frequency of best fit: 0.025
##
## R thinks it has found the ML solution.
print(fit4$all.fits,digits=6)
## 1<->2 2<->3 3<->4 sig2(1) sig2(2) sig2(3) sig2(4) log(L)
## 35 50.95619 73.5847 89.3514 36.466139730 6.41312e-04 1.70774e+01 0.731419 -346.064
## 21 51.41255 73.5807 90.1285 36.323787084 6.41312e-04 1.54413e+01 0.706156 -346.180
## 6 51.43523 73.5806 90.1293 36.285506735 6.41312e-04 1.54361e+01 0.706058 -346.180
## 19 51.56293 73.5802 90.4154 36.307272388 6.41312e-04 1.48826e+01 0.689743 -346.193
## 13 44.40319 90.7219 98.3022 34.316455009 1.23956e+01 3.59282e-01 0.955749 -346.469
## 25 47.40261 90.6697 98.3020 32.084403249 1.24256e+01 3.62393e-01 0.955010 -346.507
## 20 19.85445 90.6545 98.3022 126.977206373 1.27408e+01 3.61529e-01 0.955089 -346.570
## 34 89.15406 92.9507 98.3021 15.267079475 3.67117e+00 1.92580e-02 1.102262 -346.584
## 16 16.88944 18.6322 90.5389 0.000641312 8.57362e+02 1.24381e+01 0.688655 -346.589
## 37 16.23655 18.0225 90.5387 0.000641312 8.32380e+02 1.24382e+01 0.688663 -346.589
## 17 17.12496 19.5547 90.5398 0.000641312 6.19482e+02 1.24366e+01 0.688784 -346.589
## 2 16.10806 19.3002 90.5380 0.000641312 4.70792e+02 1.24393e+01 0.688700 -346.589
## 12 16.35884 19.3601 90.5379 0.000641312 5.00869e+02 1.24392e+01 0.688682 -346.589
## 15 19.85437 73.5803 89.3500 186.713332381 7.92706e+00 1.65897e+01 0.731594 -347.182
## 24 6.61844 19.6110 90.3780 0.000641312 1.69604e+02 1.27957e+01 0.697330 -347.278
## 5 44.40289 90.6458 92.4383 34.285104952 1.24059e+01 6.41312e-04 0.695366 -347.320
## 26 12.76458 43.6126 90.5465 1.274688295 4.26147e+01 1.23701e+01 0.688575 -347.335
## 36 11.88099 42.8716 90.5494 0.000641312 4.29774e+01 1.23826e+01 0.688360 -347.337
## 7 10.36366 43.5612 90.5087 0.000641312 4.08145e+01 1.24321e+01 0.690062 -347.340
## 8 44.40268 89.1633 90.9771 33.947205564 1.31210e+01 6.39674e+00 0.689274 -347.359
## 22 5.94813 44.4042 90.5957 0.000641312 3.70621e+01 1.23217e+01 0.686894 -347.361
## 18 34.76740 42.7866 90.5450 26.938296985 5.80279e+01 1.23776e+01 0.688632 -347.391
## 3 34.24624 42.9133 90.5461 26.857711269 5.57060e+01 1.23744e+01 0.688597 -347.391
## 39 34.11522 42.9723 90.5476 26.880295158 5.49191e+01 1.23753e+01 0.688534 -347.391
## 14 38.02339 41.3611 90.5393 26.888304528 1.08108e+02 1.23844e+01 0.688863 -347.391
## 29 15.02103 47.4018 90.5482 13.365606330 3.64779e+01 1.23136e+01 0.688708 -347.401
## 27 87.94451 90.9643 98.3023 15.519961399 8.59925e+00 3.59875e-01 0.955312 -347.404
## 30 25.67121 47.4008 90.5471 36.953960376 2.91502e+01 1.23830e+01 0.688494 -347.436
## 11 10.48466 12.3464 90.4392 57.451832963 1.54761e+01 1.37864e+01 0.689281 -348.229
## 28 46.87783 48.4161 64.8510 32.229135384 2.29026e+02 6.41312e-04 4.601006 -371.395
## 9 10.07909 47.2400 47.2673 1.014494319 3.61426e+01 1.20976e+04 4.508585 -371.428
## 1 14.74720 47.3096 47.3977 12.111886231 3.61473e+01 3.72370e+03 4.508548 -371.428
## 10 10.85844 47.1833 47.2245 3.549465434 3.61396e+01 8.07123e+03 4.508307 -371.428
## 38 16.06587 16.6823 55.6433 0.000641312 1.98659e+03 2.13489e+01 4.597495 -371.572
## 32 17.54874 19.1682 55.6434 0.000641312 7.88941e+02 2.13468e+01 4.597305 -371.572
## 33 15.77858 19.4204 55.6434 0.000641312 3.52177e+02 2.13485e+01 4.597281 -371.572
## 31 19.66413 19.8196 57.4729 0.000641312 8.65602e+03 1.89407e+01 4.606644 -371.617
## 23 13.96114 52.2688 64.8511 1.881912259 4.38632e+01 6.41312e-04 4.603764 -371.682
## 40 34.08048 50.9497 64.8512 23.830225162 5.68612e+01 6.41312e-04 4.603462 -371.723
## 4 33.97876 50.8104 64.9235 23.209305663 5.83322e+01 6.41312e-04 4.606284 -371.729
Although we'd be much more confident in our result if we'd converged on the same ML solution more than once, in this simulated example we’re not seeing any conspicuous problems with identifiability – which would be indicated by multiple independent optimization iterations converging on different solutions with the same likelihood. (We also seem to be getting pretty close to the true shift-points, which is neat.) They can occur though, we’ve established, so users of this method should be careful about doubly scrutinizing their findings as a result.
Edit: I repeated the last analysis with 100 optimization instead of 40 and the results holded steady.
That’s all folks!