Something that Luke & I pointed out in Chapter 8 of our recent book with Princeton University Press is that parsimony & likelihood will give the same set of ancestral states at nodes – or, more precisely, that likelihood will lead to a set of estimated ancestral states that correspond to a maximum parsimony solution for the ancestral states – if we assume, force, or if the rate or rates of transition between states is/are actually very low.
In the book, this is given on page 252 where we write “Second, parsimony implicitly assumes that the rate of evolution is slow, even if strongly contrary evidence exists in our data that the rate of evolution is relatively fast.” We go on to illustrate this conceptually using stochastic mapping.
It’s also relatively straightforward to demonstrate that this is true using joint ancestral state estimation.
In this case, if we assume or force the transition rates in the matrix Q to be low, or if their ML values are actually sufficiently low on their own accord, then the minimum of number of changes implied by the joint reconstruction of our character under likelihood should be equal to the parsimony score for the trait – essentially “proving” our thesis.
To illustrate this, I’m going to start off by using a real phylogeny & dataset of habitat (saxicolous, i.e., rock-dwelling, vs. non-saxicolous) from Ramm et al. (2019) that also features in our book.
library(phytools)
lizard_tree<-read.nexus(file="http://www.phytools.org/Rbook/7/lizard_tree.nex")
lizard_data<-read.csv(file="http://www.phytools.org/Rbook/7/lizard_spines.csv",
row.names=1,stringsAsFactors=TRUE)
chk<-geiger::name.check(lizard_tree,lizard_data)
lizard_tree<-drop.tip(lizard_tree,chk$tree_not_data)
lizard_habitat<-setNames(lizard_data$habitat,rownames(lizard_data))
head(lizard_habitat)
## Ablepharus_budaki Abronia_anzuetoi Abronia_frosti
## non-saxicolous non-saxicolous non-saxicolous
## Acanthodactylus_aureus Acanthodactylus_opheodurus Acanthodactylus_pardalis
## non-saxicolous non-saxicolous non-saxicolous
## Levels: non-saxicolous saxicolous
Let’s go ahead now & fit an “equal-rates” ("ER"
) model with equal back-and-forth rates of transition between the two states as follows.
lizard_ER<-fitMk(lizard_tree,lizard_habitat,model="ER")
lizard_ER
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## non-saxicolous saxicolous
## non-saxicolous -0.00441 0.00441
## saxicolous 0.00441 -0.00441
##
## Fitted (or set) value of pi:
## non-saxicolous saxicolous
## 0.5 0.5
## due to treating the root prior as (a) flat.
##
## Log-likelihood: -247.375704
##
## Optimization method used was "nlminb"
##
## R thinks it has found the ML solution.
We’re now prepared to undertake joint ancestral state estimation.
Joint estimation involves identifying the state of states at all nodes that (logically) jointly maximize the probability of our observed data under the model we fit to them!
lizard_anc<-ancr(lizard_ER,type="joint")
lizard_anc
## Joint ancestral state estimates:
## state
## 659 non-saxicolous
## 660 non-saxicolous
## 661 non-saxicolous
## 662 non-saxicolous
## 663 non-saxicolous
## 664 non-saxicolous
## ...
##
## Log-likelihood = -258.285663
plot(lizard_anc,type="arc",arc_height=0.1,part=1)
Now to count the minimum number of changes consistent with this reconstruction, we merely need to tabulate the starting & ending state of each edge and then sum up the number that are different.
nn<-matrix(c(lizard_habitat[lizard_tree$tip.label],
lizard_anc$ace)[lizard_tree$edge],nrow(lizard_tree$edge),2)
n_changes<-sum(apply(nn,1,function(x) x[1]!=x[2]))
n_changes
## [1] 61
This shows us that the joint reconstruction of our character on the tree is consistent with a minimum of 61 trait changes. Cool.
Now to get the minimum number of changes overall we must compute the parsimony score.
We might do this using the phangorn package – however, conveniently, phytools also now contains a simple parsimony score calculator, pscore
.
pscore(lizard_tree,lizard_habitat)
## [1] 59
We can see here that the minimum number of changes implied by our joint reconstruction and the parsimony score are not the same – the latter being very slightly smaller than the former. (If it weren’t, then it wouldn’t be the parsimony score!)
In the book we asserted that forcing the transition matrix Q to be small (that is, to have transition rates that are much lower than implied by the data) would ultimately cause our ancestral states to converge on a parsimony reconstruction. Often, for any dataset and tree there can be multiple, sometimes many, equally parsimonious set of node states. This is just saying that we should get one of these.
To show this convergence using our lizard data & tree, I’m going to create a Q matrix containing transition rates several order of magnitudes smaller than our previously-estimated ML rates, and then proceed to use this matrix for joint reconstruction.
forcedQ<-matrix(1e-6*c(-1,1,1,-1),2,2,
dimnames=list(levels(lizard_habitat),levels(lizard_habitat)))
forcedQ
## non-saxicolous saxicolous
## non-saxicolous -1e-06 1e-06
## saxicolous 1e-06 -1e-06
forced_ER<-fitMk(lizard_tree,lizard_habitat,fixedQ=forcedQ)
forced_ER
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## non-saxicolous saxicolous
## non-saxicolous -1e-06 1e-06
## saxicolous 1e-06 -1e-06
##
## Fitted (or set) value of pi:
## non-saxicolous saxicolous
## 0.5 0.5
## due to treating the root prior as (a) flat.
##
## Log-likelihood: -681.760101
##
## Optimization method used was ""
We can see that fitMk
allows us to supply this arbitrary Q matrix instead of estimating one.
In this case, it explains our data much, much more poorly than did our ML solution from earlier, but, nonetheless, now that we’ve created this object, we can go ahead and apply ancr
to get the joint reconstruction, just as we did four our ML fitted Mk model earlier.
lizard_anc<-ancr(forced_ER,type="joint")
nn<-matrix(c(lizard_habitat[lizard_tree$tip.label],
lizard_anc$ace)[lizard_tree$edge],nrow(lizard_tree$edge),2)
n_changes<-sum(apply(nn,1,function(x) x[1]!=x[2]))
n_changes
## [1] 59
Awesome. This shows that when we force Q to be small (in spite of contrary evidence in our data), our joint reconstruction implies a minimum number of changes that is equal to the parsimony score.
Of course, we haven’t shown (at all!) that this is general, but we can simulate a tree & dataset in which the difference between the implied minimum number of changes in the joint reconstruction is larger, force a smaller value of Q, and compare the result.
Here I’ll use a three-state character, just for fun.
## simulate tree
simTree<-pbtree(n=200,scale=10)
## set Q
simQ<-matrix(c(-2,1,1,
1,-2,1,
1,1,-2),3,3,
dimnames=list(letters[1:3],letters[1:3]))
## simulate data
simX<-sim.Mk(simTree,simQ)
## fit model
simFit<-fitMk(simTree,simX,model="ER")
simFit
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## a b c
## a -2.30694 1.15347 1.15347
## b 1.15347 -2.30694 1.15347
## c 1.15347 1.15347 -2.30694
##
## Fitted (or set) value of pi:
## a b c
## 0.333333 0.333333 0.333333
## due to treating the root prior as (a) flat.
##
## Log-likelihood: -215.770919
##
## Optimization method used was "nlminb"
##
## R thinks it has found the ML solution.
## joint reconstruction
simAnc<-ancr(simFit,type="joint")
## Note:
## there may be more than one equally likely
## joint reconstruction.
simAnc
## Joint ancestral state estimates:
## state
## 201 b
## 202 b
## 203 b
## 204 b
## 205 a
## 206 a
## ...
##
## Log-likelihood = -362.490735
## compute minimum implied changes under joint reconstruction
nn<-matrix(c(simX[simTree$tip.label],simAnc$ace)[simTree$edge],
nrow(simTree$edge),2)
n_changes<-sum(apply(nn,1,function(x) x[1]!=x[2]))
n_changes
## [1] 99
When we compare this value to the parsimony score for the same tree and data, we should see that the latter is much lower in value, correct?
pscore(simTree,simX)
## [1] 87
Now, just as we did before for our empirical example, let’s proceed to force Q to have a value much lower than the value we set (in our simulation) or previously estimated (using fitMk
).
forced_ER<-fitMk(simTree,simX,fixedQ=1e-6*simQ)
forced_ER
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## a b c
## a -2e-06 1e-06 1e-06
## b 1e-06 -2e-06 1e-06
## c 1e-06 1e-06 -2e-06
##
## Fitted (or set) value of pi:
## a b c
## 0.333333 0.333333 0.333333
## due to treating the root prior as (a) flat.
##
## Log-likelihood: -1198.196004
##
## Optimization method used was ""
Finally, let’s undertake joint estimation using this value of Q and compare it to our parsimony score, from above.
simAnc_2<-ancr(forced_ER,type="joint")
## Note:
## there may be more than one equally likely
## joint reconstruction.
nn<-matrix(c(simX[simTree$tip.label],simAnc_2$ace)[simTree$edge],
nrow(simTree$edge),2)
n_changes<-sum(apply(nn,1,function(x) x[1]!=x[2]))
n_changes
## [1] 87
This matches our score, just as we saw for the empirical case.
Simulate some other conditions and see if the pattern holds true. I bet it does!
No comments:
Post a Comment
Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.