phytools presently has two different functions that do marginal ancestral state reconstruction for discrete characters: ancr
and rerootingMethod
. The latter of these two implements a re-rooting algorithm described (so far as I can tell) in Yang et al. (1995). Since the way this algorithm works is by re-rooting the tree at each internal node and re-computing the partial (scaled) likelihoods at the (new, temporary) root each time, it obviously will not work for asymmetrical transition rates between states!
As ancr
and rerootingMethod
are essentially redundant, I’m “decommissioning” rerootingMethod
. It will remain in the package; however, every use will trigger a “note” for the user suggesting ancr
.
As part of this decommissioning, I thought it would be worth showing that the function returns the same marginal ancestral states as ancr
(and other functions), so long as its assumptions are met.
library(phytools)
library(corHMM)
In this demo, I’ll use a dataset for feeding mode in sunfishes (Centrarchidae) from Revell & Collar (2009).
data(sunfish.tree)
sunfish.tree
##
## Phylogenetic tree with 28 tips and 27 internal nodes.
##
## Tip labels:
## Acantharchus_pomotis, Lepomis_gibbosus, Lepomis_microlophus, Lepomis_punctatus, Lepomis_miniatus, Lepomis_auritus, ...
##
## The tree includes a mapped, 2-state discrete character
## with states:
## non, pisc
##
## Rooted; includes branch lengths.
Our tree already has a mapped discrete character encoded (and is, thusly, an object of class "simmap"
as well as "phylo"
). Let’s strip that away using as.phylo
as follows.
sunfish.tree<-as.phylo(sunfish.tree)
sunfish.tree
##
## Phylogenetic tree with 28 tips and 27 internal nodes.
##
## Tip labels:
## Acantharchus_pomotis, Lepomis_gibbosus, Lepomis_microlophus, Lepomis_punctatus, Lepomis_miniatus, Lepomis_auritus, ...
##
## Rooted; includes branch lengths.
Now our data.
data(sunfish.data)
head(sunfish.data)
## feeding.mode gape.width buccal.length
## Acantharchus_pomotis pisc 0.114 -0.009
## Lepomis_gibbosus non -0.133 -0.009
## Lepomis_microlophus non -0.151 0.012
## Lepomis_punctatus non -0.103 -0.019
## Lepomis_miniatus non -0.134 0.001
## Lepomis_auritus non -0.222 -0.039
The discrete character from this dataset that we’ll analyze is feeding.mode
, so let’s pull it into a vector.
feeding_mode<-setNames(sunfish.data$feeding.mode,rownames(sunfish.data))
Now we’re ready to compute marginal ancestral states. We can start with an "ER"
(equal-rates) model. I’m going to use three different functions: the aforementioned ancr
and rerootingMethod
, as well as corHMM
from the corHMM package, and ace
in ape.
## ancr
sunfish.er_model<-fitMk(sunfish.tree,feeding_mode,model="ER")
sunfish.er_ancr<-ancr(sunfish.er_model)
sunfish.er_ancr
## Marginal ancestral state estimates:
## non pisc
## 29 0.310020 0.689980
## 30 0.311699 0.688301
## 31 0.358258 0.641742
## 32 0.827481 0.172519
## 33 0.987669 0.012331
## 34 0.995779 0.004221
## ...
##
## Log-likelihood = -13.07453
## ape::ace
sunfish.er_ace<-ace(feeding_mode,sunfish.tree,type="discrete",model="ER")
sunfish.er_ace
##
## Ancestral Character Estimation
##
## Call: ace(x = feeding_mode, phy = sunfish.tree, type = "discrete",
## model = "ER")
##
## Log-likelihood: -12.38138
##
## Rate index matrix:
## non pisc
## non . 1
## pisc 1 .
##
## Parameter estimates:
## rate index estimate std-err
## 1 4.2289 1.6056
##
## Scaled likelihoods at the root (type '...$lik.anc' to get them for all nodes):
## non pisc
## 0.31002 0.68998
## corHMM::corHMM
Data<-data.frame(Genus_species=names(feeding_mode),
x=feeding_mode)
sunfish.er_corhmm<-corHMM(sunfish.tree,data=Data,rate.cat=1,model="ER",
root.p=c(0.5,0.5))
## State distribution in data:
## States: 1 2
## Counts: 12 16
## Beginning thorough optimization search -- performing 0 random restarts
## Finished. Inferring ancestral states using marginal reconstruction.
sunfish.er_corhmm
##
## Fit
## -lnL AIC AICc Rate.cat ntax
## -13.07453 28.14906 28.30291 1 28
##
## Legend
## 1 2
## "non" "pisc"
##
## Rates
## (1,R1) (2,R1)
## (1,R1) NA 4.228909
## (2,R1) 4.228909 NA
##
## Arrived at a reliable solution
## rerootingMethod
sunfish.er_yang<-rerootingMethod(sunfish.tree,feeding_mode,model="ER")
##
## Note:
## This function is redundant with 'phytools::ancr' in situations in
## which it should be used (symmetric Q matrices) & invalid for non-
## symmetric Q matrices (e.g., model='ARD').
print(sunfish.er_yang,printlen=10)
## Ancestral character estimates using re-rooting method
## of Yang et al. (1995):
## non pisc
## 29 0.310020 0.689980
## 30 0.311699 0.688301
## 31 0.358258 0.641742
## 32 0.827481 0.172519
## 33 0.987669 0.012331
## 34 0.995779 0.004221
## 35 0.999566 0.000434
## 36 0.999931 0.000069
## 37 0.999266 0.000734
## 38 0.999809 0.000191
## ...
##
## Estimated transition matrix,
## Q =
## non pisc
## non -4.228856 4.228856
## pisc 4.228856 -4.228856
##
## **Note that if Q is not symmetric the marginal
## reconstructions may be invalid.
##
## Log-likelihood = -13.07453
Superficiallly, these estimates look similar. Let’s compare them in all combinations.
layout(matrix(c(1,2,3,7,4,5,7,7,6),3,3,byrow=FALSE))
plot(sunfish.er_ancr$ace,sunfish.er_ace$lik.anc,
xlab="phytools::ancr",ylab="ape::ace",
bty="n",pch=16,col=make.transparent("blue",0.5))
grid()
mtext("a) phytools::ancr vs. ace (model=\"ER\")",line=2,
cex=0.8)
plot(sunfish.er_ancr$ace,sunfish.er_corhmm$states,
xlab="phytools::ancr",ylab="corHMM",
bty="n",pch=16,col=make.transparent("blue",0.5))
grid()
mtext("b) phytools::ancr vs. corHMM (model=\"ER\")",line=2,
cex=0.8)
plot(sunfish.er_ancr$ace,sunfish.er_yang$marginal.anc,
xlab="phytools::ancr",ylab="phytools::rerootingMethod",
bty="n",pch=16,col=make.transparent("blue",0.5))
grid()
mtext("c) phytools::ancr vs.rerootingMethod",
line=2,cex=0.8)
plot(sunfish.er_ace$lik.anc,sunfish.er_corhmm$states,
xlab="ape::ace",ylab="corHMM",
bty="n",pch=16,col=make.transparent("blue",0.5))
grid()
mtext("d) ape::ace vs. corHMM (model=\"ER\")",line=2,
cex=0.8)
plot(sunfish.er_ace$lik.anc,sunfish.er_yang$marginal.anc,
xlab="ape::ace",ylab="phytools::rerootingMethod",
bty="n",pch=16,col=make.transparent("blue",0.5))
grid()
mtext("e) ape::ace vs. rerootingMethod",line=2,
cex=0.8)
plot(sunfish.er_corhmm$states,sunfish.er_yang$marginal.anc,
xlab="corHMM",ylab="phytools::rerootingMethod",
bty="n",pch=16,col=make.transparent("blue",0.5))
grid()
mtext("f) corHMM vs. phytools::rerootingMethod",line=2,
cex=0.8)
This shows us that all four approaches give us the same marginal estimates. That’s good.
Now let’s compare this to an asymmetric model: "ARD"
. Note that rerootingMethod
should not be used for this type of model – but the function will still run & give a result. ancr
and corHMM::corHMM
are expected to behave fine. Previously I had identified problems with ape::ace
for non-symmetric Q matrices.
## ancr
sunfish.ard_model<-fitMk(sunfish.tree,feeding_mode,model="ARD")
sunfish.ard_ancr<-ancr(sunfish.ard_model)
sunfish.ard_ancr
## Marginal ancestral state estimates:
## non pisc
## 29 0.554292 0.445708
## 30 0.568076 0.431924
## 31 0.589648 0.410352
## 32 0.931603 0.068397
## 33 0.995898 0.004102
## 34 0.998398 0.001602
## ...
##
## Log-likelihood = -12.864937
sunfish.ard_ace<-ace(feeding_mode,sunfish.tree,type="discrete",model="ARD")
sunfish.ard_ace
##
## Ancestral Character Estimation
##
## Call: ace(x = feeding_mode, phy = sunfish.tree, type = "discrete",
## model = "ARD")
##
## Log-likelihood: -12.17179
##
## Rate index matrix:
## non pisc
## non . 2
## pisc 1 .
##
## Parameter estimates:
## rate index estimate std-err
## 1 3.0489 2.1746
## 2 6.0878 2.5876
##
## Scaled likelihoods at the root (type '...$lik.anc' to get them for all nodes):
## non pisc
## 0.5542917 0.4457083
Data<-data.frame(Genus_species=names(feeding_mode),
x=feeding_mode)
sunfish.ard_corhmm<-corHMM(sunfish.tree,data=Data,rate.cat=1,model="ARD",
root.p=c(0.5,0.5))
## State distribution in data:
## States: 1 2
## Counts: 12 16
## Beginning thorough optimization search -- performing 0 random restarts
## Finished. Inferring ancestral states using marginal reconstruction.
sunfish.ard_corhmm
##
## Fit
## -lnL AIC AICc Rate.cat ntax
## -12.86494 29.72987 30.20987 1 28
##
## Legend
## 1 2
## "non" "pisc"
##
## Rates
## (1,R1) (2,R1)
## (1,R1) NA 6.08716
## (2,R1) 3.049321 NA
##
## Arrived at a reliable solution
sunfish.ard_yang<-rerootingMethod(sunfish.tree,feeding_mode,model="ARD")
##
## Note:
## This function is redundant with 'phytools::ancr' in situations in
## which it should be used (symmetric Q matrices) & invalid for non-
## symmetric Q matrices (e.g., model='ARD').
print(sunfish.ard_yang,printlen=10)
## Ancestral character estimates using re-rooting method
## of Yang et al. (1995):
## non pisc
## 29 0.554292 0.445708
## 30 0.601789 0.398211
## 31 0.645854 0.354146
## 32 0.955758 0.044242
## 33 0.997528 0.002472
## 34 0.999098 0.000902
## 35 0.999857 0.000143
## 36 0.999975 0.000025
## 37 0.999785 0.000215
## 38 0.999937 0.000063
## ...
##
## Estimated transition matrix,
## Q =
## non pisc
## non -3.048905 6.087789
## pisc 3.048905 -6.087789
##
## **Note that if Q is not symmetric the marginal
## reconstructions may be invalid.
##
## Log-likelihood = -12.864937
layout(matrix(c(1,2,3,7,4,5,7,7,6),3,3,byrow=FALSE))
plot(sunfish.ard_ancr$ace,sunfish.ard_ace$lik.anc,
xlab="phytools::ancr",ylab="ape::ace",
bty="n",pch=16,col=make.transparent("blue",0.5))
grid()
mtext("a) phytools::ancr vs. ace (model=\"ARD\")",line=2,
cex=0.8)
plot(sunfish.ard_ancr$ace,sunfish.ard_corhmm$states,
xlab="phytools::ancr",ylab="corHMM",
bty="n",pch=16,col=make.transparent("blue",0.5))
grid()
mtext("b) phytools::ancr vs. corHMM (model=\"ARD\")",line=2,
cex=0.8)
plot(sunfish.ard_ancr$ace,sunfish.ard_yang$marginal.anc,
xlab="phytools::ancr",ylab="phytools::rerootingMethod",
bty="n",pch=16,col=make.transparent("blue",0.5))
grid()
mtext("c) phytools::ancr vs.rerootingMethod",
line=2,cex=0.8)
plot(sunfish.ard_ace$lik.anc,sunfish.ard_corhmm$states,
xlab="ape::ace",ylab="corHMM",
bty="n",pch=16,col=make.transparent("blue",0.5))
grid()
mtext("d) ape::ace vs. corHMM (model=\"ARD\")",line=2,
cex=0.8)
plot(sunfish.ard_ace$lik.anc,sunfish.ard_yang$marginal.anc,
xlab="ape::ace",ylab="phytools::rerootingMethod",
bty="n",pch=16,col=make.transparent("blue",0.5))
grid()
mtext("e) ape::ace vs. rerootingMethod",line=2,
cex=0.8)
plot(sunfish.ard_corhmm$states,sunfish.ard_yang$marginal.anc,
xlab="corHMM",ylab="phytools::rerootingMethod",
bty="n",pch=16,col=make.transparent("blue",0.5))
grid()
mtext("f) corHMM vs. phytools::rerootingMethod",line=2,
cex=0.8)
OK, this just tells us that ancr
and corHMM::corHMM
yield the same results – it cannot arbitrate whether or not ape::ace
is correct! (On the other hand, we know that rerootingMethod
should have the wrong result here.)
How can we evaluate this discrepancy? Well, there’s a fourth R package that we can use to reconstruct marginal ancestral states, and thats the diversitree package by Rich FitzJohn. Let’s load it and put it to work.
(I need to apply a few small hacks here. Our tree barely fails is.ultrametric
, so I’ll use force.ultrametric
to pass. We also need to convert our factor vector to a numeric integer vector starting with 0
.)
library(diversitree)
phy<-force.ultrametric(sunfish.tree,message=FALSE)
x<-setNames(as.numeric(feeding_mode),names(feeding_mode))-1
lik<-make.mk2(phy,x)
sunfish.ard_model2<-find.mle(lik,x.init=c(4,4),root=ROOT.FLAT)
sunfish.ard_diversitree<-asr.marginal(lik,
pars=sunfish.ard_model2$par)
par(mar=c(5.1,4.1,2.1,1.1))
plot(sunfish.ard_ancr$ace,t(sunfish.ard_diversitree),
xlab="phytools::ancr",ylab="diversitree::asr.marginal",
bty="n",pch=16,col=make.transparent("blue",0.5),
cex=2,las=1,cex.axis=0.8)
grid()
lines(c(0,1),c(0,1))
This shows us that diversitree gives the same marginal states as phytools::ancr
(and corHMM), so adds a bit of credence to my assertion that these are the correct ones.
Lastly, what about computation time?
To see that, I’m going to simulate a much larger tree & dataset – let’s say with 2,000 tips.
Q<-matrix(c(-1,1,0,0.5,1,0.5,0,1,-1),3,3,dimnames=list(0:2,0:2))
x<-sim.Mk(tree<-pbtree(n=2000,scale=1),Q)
Now, I’ll estimate marginal states by wrapping my function call of fitMk
with an ancr
call. (We could also using a pipe, %>%
, here, if we wanted.)
system.time(ace_sym1<-ancr(fitMk(tree,x,model="SYM")))
## user system elapsed
## 13.06 0.25 54.64
ace_sym1
## Marginal ancestral state estimates:
## 0 1 2
## 2001 0.960122 0.038142 0.001736
## 2002 0.999858 0.000142 0.000000
## 2003 0.999990 0.000010 0.000000
## 2004 0.999989 0.000011 0.000000
## 2005 0.999882 0.000118 0.000000
## 2006 0.999707 0.000293 0.000000
## ...
##
## Log-likelihood = -633.448457
Now let’s do the same using ape::ace
.
system.time(ace_sym2<-ace(x,tree,type="discrete",model="SYM"))
## user system elapsed
## 0.29 0.00 1.04
ace_sym2
##
## Ancestral Character Estimation
##
## Call: ace(x = x, phy = tree, type = "discrete", model = "SYM")
##
## Log-likelihood: -632.3498
##
## Rate index matrix:
## 0 1 2
## 0 . 1 2
## 1 1 . 3
## 2 2 3 .
##
## Parameter estimates:
## rate index estimate std-err
## 1 0.5920 0.0402
## 2 0.0000 0.0133
## 3 1.0815 0.1520
##
## Scaled likelihoods at the root (type '...$lik.anc' to get them for all nodes):
## 0 1 2
## 0.960121792 0.038141957 0.001736251
We can see that ace
is much faster, so (inasmuch as it gives us the right estimates) that’s good! (To be fair, we could’ve used opt.method="optimParallel"
and some other options to speed up fitMk
, but we’d never have caught up to ace
!)
par(mar=c(5.1,4.1,2.1,1.1))
plot(ace_sym1$ace,ace_sym2$lik.anc,
xlab="phytools::ancr",ylab="ape::ace",
bty="n",pch=16,col=make.transparent("blue",0.25),
cex=1.2,las=1,cex.axis=0.8)
grid()
lines(c(0,1),c(0,1))
That’s all folks!
No comments:
Post a Comment
Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.