Another important phytools update in the latest CRAN update is the
addition of the Fitzjohn et al. (2009) prior distribution on the root
state, π, to fitMk
(and most of the functions that use fitMk
internally - though not yet make.simmap
).
The Fitzjohn et al. approach is to basically treat π as a nuisance parameter estimately jointly with the fitted model parameters, rather than specified separately by the user. (For more details, see Fitzjohn et al. 2009.) Other possibilities are to set the root prior as flat (that is an equal prior probability for each value of the caracter), equilibrium, or some arbitrary value chosen by the user.
I've never seen a study that measured model accuracy as a function of this choice (though this might be
interesting - please point it out in the comments if one exists!); however, the Fitzjohn et al. prior is used by
default by some other R software (e.g., geiger::fitDiscrete
and in the diversitree package).
Although there's no universal consensus on which root prior to use, this decision can substantially impact the paramter estimates in our fitted model.
Here's a very simple example using data for feeding mode in Centrarchidae from Revell and Collar (2009), though the phylogeny dates to Near et al. (2005).
library(phytools)
data(sunfish.tree)
sunfish.tree
##
## Phylogenetic tree with 28 tips and 27 internal nodes.
##
## Tip labels:
## Acantharchus_pomotis, Lepomis_gibbosus, Lepomis_microlophus, Lepomis_punctatus, Lepomis_miniatus, Lepomis_auritus, ...
##
## The tree includes a mapped, 2-state discrete character
## with states:
## non, pisc
##
## Rooted; includes branch lengths.
data(sunfish.data)
head(sunfish.data)
## feeding.mode gape.width buccal.length
## Acantharchus_pomotis pisc 0.114 -0.009
## Lepomis_gibbosus non -0.133 -0.009
## Lepomis_microlophus non -0.151 0.012
## Lepomis_punctatus non -0.103 -0.019
## Lepomis_miniatus non -0.134 0.001
## Lepomis_auritus non -0.222 -0.039
Let's pull out of our trait, and then map it on the tips of the tree using the plotting method I demonstrated yesterday.
fmode<-setNames(sunfish.data$feeding.mode,
rownames(sunfish.data))
levels(fmode)<-c("non-piscivorous","piscivorous")
plotTree(sunfish.tree,plot=FALSE)
h<-par()$usr[2]
plotTree(sunfish.tree,add=TRUE,xlim=c(0,1.05*h),
ftype="i",offset=1.2)
pp<-get("last_plot.phylo",envir=.PlotPhyloEnv)
cols<-setNames(palette()[c(4,2)],
levels(fmode))[fmode[sunfish.tree$tip.label]]
h<-max(nodeHeights(sunfish.tree))
for(i in 1:Ntip(sunfish.tree))
polygon(c(h,1.05*h,1.05*h,h),
rep(pp$yy[i],4)+c(-0.5,-0.5,0.5,0.5),
col=cols[i])
legend("topleft",levels(fmode),pch=22,
pt.bg=palette()[c(4,2)],pt.cex=2,
bty="n")
Now let's fit our model. First, here's the ARD model assuming a flat prior:
flatprior.fit<-fitMk(sunfish.tree,fmode,model="ARD")
flatprior.fit
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## non-piscivorous piscivorous
## non-piscivorous -6.087789 6.087789
## piscivorous 3.048905 -3.048905
##
## Fitted (or set) value of pi:
## non-piscivorous piscivorous
## 0.5 0.5
## due to treating the root prior as (a) flat.
##
## Log-likelihood: -12.864937
##
## Optimization method used was "nlminb"
This model implies that the character can change both from non-piscivory to piscivory, and back again - although the latter with a lower rate.
Let's now contrast this with the Fitzjohn et al. prior:
fitzprior.fit<-fitMk(sunfish.tree,fmode,model="ARD",
pi="fitzjohn")
fitzprior.fit
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## non-piscivorous piscivorous
## non-piscivorous -6.011056 6.011056
## piscivorous 0.000000 0.000000
##
## Fitted (or set) value of pi:
## non-piscivorous piscivorous
## 1 0
## due to treating the root prior as (a) nuisance.
##
## Log-likelihood: -12.295051
##
## Optimization method used was "nlminb"
Now the best-fitting model changes to be an irreversible model. Evolutionary change is only from non-piscivory to piscivory, and never the reverse!
This, by the way, corresponds with what we'd obtain using fitDiscrete
in geiger.
library(geiger)
as.Qmatrix(fitDiscrete(sunfish.tree,fmode,
model="ARD"))
## Estimated Q matrix:
## non-piscivorous piscivorous
## non-piscivorous -6.011053e+00 6.011053e+00
## piscivorous 1.400167e-14 -1.400167e-14
OK, well the point isn't that one approach is “correct” and the other not - rather that our choice of prior can have a large influence on what our fitted model implies about the evolutionary process for our discrete trait. (More often, this effect is more subtle than in the example here - but there's almost always some difference.)
Visually, let's compare our two models:
par(mfrow=c(1,2))
plot(flatprior.fit,show.zeros=FALSE)
mtext("a) flat prior",line=0,adj=0.1)
plot(fitzprior.fit,show.zeros=FALSE)
mtext("b) Fitzjohn et al. (2009) prior",line=0,adj=0.1)
To ensure backward compatibility with any package that imports fitMk
I've left the default prior to be
the flat prior, but this is easily changed by setting pi="fitzjohn"
. This also applies to (at least) fitpolyMk
and fitHRM
- though not yet to make.simmap
(as previously mentioned, although the user can set whatever custom
prior they want), nor fitmultiMk
. Look for that in the future.
Hi Liam,
ReplyDeleteThanks for this useful addition. A couple of questions...
First, how does one set a custom prior? Are there any instructions available?
Second, I've noticed that in this update to phytools fitPagel generates quite different results when method = "fitMK" versus when method = "ace" or method = "fitDiscrete". This doesn't appear to have anything to do with the root prior. Is there an explanation for this?
Thanks in advance.
Jack.
Hi Jack.
DeleteCan you send or post a reproducible example?
I know of at least one use-case in which method="fitDiscrete" *does not* work, and that is when some trait combinations (i.e., 0|0, 0|1, 1|0, or 1|1) don't exist in the data. As such, I should probably remove that option or add a warning.
- Liam
Thanks for your response. As I mentioned in my response to your other message, I think you're right about failure to optimise - with most other datasets I've tried there is no problem.
DeleteAnd thanks for the comment on fitDiscrete - I noticed that too.
Jack.
This is great, Liam - thx much! Are there still plans to implement this root prior (FitzJohn et al. 2009) in make.simmap? Or is there a workaround you would recommend in the interim (ie. calculate separately and input into make.simmap?)? Thx either way! -Matt.
ReplyDelete