I thought it might be useful to post a quick tutorial on how to do basic
model comparison for discrete character evolution using the phytools function
fitMk
. fitMk
actually borrows a lot of code from
the ape function ace
, although it does not do marginal ancestral
state estimation but it does allow the user flexibility in specifying the
prior distribution π. This can theoretically be
important.
## first load packages
library(phytools)
For this exercise we will use the Anolis ecomorph data:
anoletree
##
## Phylogenetic tree with 82 tips and 81 internal nodes.
##
## Tip labels:
## Anolis_ahli, Anolis_allogus, Anolis_rubribarbus, Anolis_imias, Anolis_sagrei, Anolis_bremeri, ...
##
## Rooted; includes branch lengths.
ecomorph
## Anolis_ahli Anolis_allogus Anolis_rubribarbus
## "TG" "TG" "TG"
## Anolis_imias Anolis_sagrei Anolis_bremeri
## "TG" "TG" "TG"
## Anolis_quadriocellifer Anolis_ophiolepis Anolis_mestrei
## "TG" "GB" "TG"
## Anolis_jubar Anolis_homolechis Anolis_confusus
## "TG" "TG" "TG"
## Anolis_guafe Anolis_garmani Anolis_opalinus
## "TG" "CG" "TC"
## Anolis_grahami Anolis_valencienni Anolis_lineatopus
## "TC" "Tw" "TG"
## Anolis_evermanni Anolis_stratulus Anolis_krugi
## "TC" "TC" "GB"
## Anolis_pulchellus Anolis_gundlachi Anolis_poncensis
## "GB" "TG" "GB"
## Anolis_cooki Anolis_cristatellus Anolis_brevirostris
## "TG" "TG" "Tr"
## Anolis_caudalis Anolis_marron Anolis_websteri
## "Tr" "Tr" "Tr"
## Anolis_distichus Anolis_alumina Anolis_semilineatus
## "Tr" "GB" "GB"
## Anolis_olssoni Anolis_insolitus Anolis_whitemani
## "GB" "Tw" "TG"
## Anolis_haetianus Anolis_breslini Anolis_armouri
## "TG" "TG" "TG"
## Anolis_cybotes Anolis_shrevei Anolis_longitibialis
## "TG" "TG" "TG"
## Anolis_strahmi Anolis_marcanoi Anolis_baleatus
## "TG" "TG" "CG"
## Anolis_barahonae Anolis_ricordii Anolis_cuvieri
## "CG" "CG" "CG"
## Anolis_altitudinalis Anolis_oporinus Anolis_isolepis
## "TC" "TC" "TC"
## Anolis_allisoni Anolis_porcatus Anolis_loysiana
## "TC" "TC" "Tr"
## Anolis_guazuma Anolis_placidus Anolis_sheplani
## "Tw" "Tw" "Tw"
## Anolis_alayoni Anolis_angusticeps Anolis_paternus
## "Tw" "Tw" "Tw"
## Anolis_alutaceus Anolis_inexpectatus Anolis_clivicola
## "GB" "GB" "GB"
## Anolis_cupeyalensis Anolis_cyanopleurus Anolis_alfaroi
## "GB" "GB" "GB"
## Anolis_macilentus Anolis_vanidicus Anolis_baracoae
## "GB" "GB" "CG"
## Anolis_noblei Anolis_smallwoodi Anolis_luteogularis
## "CG" "CG" "CG"
## Anolis_equestris Anolis_bahorucoensis Anolis_dolichocephalus
## "CG" "GB" "GB"
## Anolis_hendersoni Anolis_darlingtoni Anolis_aliniger
## "GB" "Tw" "TC"
## Anolis_singularis Anolis_chlorocyanus Anolis_coelestinus
## "TC" "TC" "TC"
## Anolis_occultus
## "Tw"
We can fit some different models as follows:
## equal-rates model
fitER<-fitMk(anoletree,ecomorph,model="ER")
fitER
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## CG GB TC TG Tr Tw
## CG -0.115707 0.023141 0.023141 0.023141 0.023141 0.023141
## GB 0.023141 -0.115707 0.023141 0.023141 0.023141 0.023141
## TC 0.023141 0.023141 -0.115707 0.023141 0.023141 0.023141
## TG 0.023141 0.023141 0.023141 -0.115707 0.023141 0.023141
## Tr 0.023141 0.023141 0.023141 0.023141 -0.115707 0.023141
## Tw 0.023141 0.023141 0.023141 0.023141 0.023141 -0.115707
##
## Fitted (or set) value of pi:
## CG GB TC TG Tr Tw
## 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
##
## Log-likelihood: -79.837802
## symmetric model
fitSYM<-fitMk(anoletree,ecomorph,model="SYM")
fitSYM
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## CG GB TC TG Tr Tw
## CG -0.057272 0.024020 0.033252 0.000000 0.000000 0.000000
## GB 0.024020 -0.200333 0.042382 0.060093 0.013413 0.060426
## TC 0.033252 0.042382 -0.135863 0.000000 0.024276 0.035953
## TG 0.000000 0.060093 0.000000 -0.060093 0.000000 0.000000
## Tr 0.000000 0.013413 0.024276 0.000000 -0.037689 0.000000
## Tw 0.000000 0.060426 0.035953 0.000000 0.000000 -0.096379
##
## Fitted (or set) value of pi:
## CG GB TC TG Tr Tw
## 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
##
## Log-likelihood: -73.658443
## all-rates-different model
fitARD<-fitMk(anoletree,ecomorph,model="ARD")
## Warning in log(comp[1:M + N]): NaNs produced
## Warning in log(comp[1:M + N]): NaNs produced
fitARD
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## CG GB TC TG Tr Tw
## CG -0.425387 0.066978 0.093014 0.104651 0.026629 0.134115
## GB 0.000006 -0.000624 0.000592 0.000000 0.000000 0.000025
## TC 0.031442 0.000010 -0.070540 0.000008 0.038980 0.000100
## TG 0.000041 0.094322 0.000000 -0.095642 0.000562 0.000717
## Tr 0.000003 0.000032 0.036053 0.040923 -0.099574 0.022562
## Tw 0.000000 0.038455 0.043720 0.000000 0.000000 -0.082175
##
## Fitted (or set) value of pi:
## CG GB TC TG Tr Tw
## 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
##
## Log-likelihood: -68.328061
In addition to these basic models, we can also fit any arbitrary model. So, for instance, we could fit a model in which character evolution is ordered. In this case I will assume that ecomorphs can only evolve alphabetically - a totally ridiculous assumption, of course - but useful to demonstrate the way such models are set up:
states<-sort(unique(ecomorph))
ordered<-matrix(0,length(states),length(states),
dimnames=list(states,states))
for(i in 1:length(states))
if((i+1)<=length(states))
ordered[i,i+1]<-ordered[i+1,i]<-1
ordered
## CG GB TC TG Tr Tw
## CG 0 1 0 0 0 0
## GB 1 0 1 0 0 0
## TC 0 1 0 1 0 0
## TG 0 0 1 0 1 0
## Tr 0 0 0 1 0 1
## Tw 0 0 0 0 1 0
fitOrdered<-fitMk(anoletree,ecomorph,model=ordered)
Similarly, we could fit an ordered model but one in which the backward & forward rates are different (here indicated by the superdiagonal and subdiagonal elements, respectively):
asym.ordered<-matrix(0,length(states),length(states),
dimnames=list(states,states))
for(i in 1:length(states))
if((i+1)<=length(states)){
asym.ordered[i,i+1]<-1
asym.ordered[i+1,i]<-2
}
asym.ordered
## CG GB TC TG Tr Tw
## CG 0 1 0 0 0 0
## GB 2 0 1 0 0 0
## TC 0 2 0 1 0 0
## TG 0 0 2 0 1 0
## Tr 0 0 0 2 0 1
## Tw 0 0 0 0 2 0
fitAsym<-fitMk(anoletree,ecomorph,model=asym.ordered)
Note that the indices 1
and 2
have nothing
to do with the actual rates that are fit - they merely allow us to
tell R to fit (in this case) two different rates, and where those
different rates should be distribution in the matrix. In other words
a symmetric model can also be specified as follows:
sym.model<-matrix(0,length(states),length(states),
dimnames=list(states,states))
k<-1
for(i in 1:(length(states)-1)){
for(j in (i+1):length(states)){
print(c(i,j))
sym.model[i,j]<-sym.model[j,i]<-k
k<-k+1
}
}
## [1] 1 2
## [1] 1 3
## [1] 1 4
## [1] 1 5
## [1] 1 6
## [1] 2 3
## [1] 2 4
## [1] 2 5
## [1] 2 6
## [1] 3 4
## [1] 3 5
## [1] 3 6
## [1] 4 5
## [1] 4 6
## [1] 5 6
sym.model
## CG GB TC TG Tr Tw
## CG 0 1 2 3 4 5
## GB 1 0 6 7 8 9
## TC 2 6 0 10 11 12
## TG 3 7 10 0 13 14
## Tr 4 8 11 13 0 15
## Tw 5 9 12 14 15 0
fitSymModel<-fitMk(anoletree,ecomorph,model=sym.model)
fitSymModel
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## CG GB TC TG Tr Tw
## CG -0.057272 0.024020 0.033252 0.000000 0.000000 0.000000
## GB 0.024020 -0.200333 0.042382 0.060093 0.013413 0.060426
## TC 0.033252 0.042382 -0.135863 0.000000 0.024276 0.035953
## TG 0.000000 0.060093 0.000000 -0.060093 0.000000 0.000000
## Tr 0.000000 0.013413 0.024276 0.000000 -0.037689 0.000000
## Tw 0.000000 0.060426 0.035953 0.000000 0.000000 -0.096379
##
## Fitted (or set) value of pi:
## CG GB TC TG Tr Tw
## 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
##
## Log-likelihood: -73.658443
Finally, we can compare among models using a likelihood ratio, for nested models, or using AIC. Remember, lower AIC indicate better model fit penalizing for the number of parameters in the fitted model:
AIC(fitER) ## for instance
## [1] 161.6756
## or over all models
aic<-as.matrix(sapply(list(ER=fitER,SYM=fitSYM,ARD=fitARD,
ordered=fitOrdered,assymetric.ordered=fitAsym),AIC))
aic
## [,1]
## ER 161.6756
## SYM 177.3169
## ARD 196.6561
## ordered 202.0630
## assymetric.ordered 203.6836
Looks like the equal-rates model wins!
Let's try it with a simpler dataset simulated under an irreversible model:
set.seed(10)
tree<-pbtree(n=150,scale=1)
Q<-matrix(c(-1,1,1e-8,-1e-8),2,2)
rownames(Q)<-colnames(Q)<-letters[1:2]
x<-sim.history(tree,Q,anc="a")$states
## Note - the rate of substitution from i->j should be given by Q[j,i].
## Done simulation(s).
cols<-setNames(c("red","blue"),letters[1:2])
par(fg="transparent")
dotTree(tree,x,ftype="off",lwd=1,colors=cols,legend=FALSE)
par(fg="black")
add.simmap.legend(colors=cols,shape="circle",x=0,y=6,prompt=FALSE)
fitER<-fitMk(tree,x,model="ER")
fitARD<-fitMk(tree,x,model="ARD")
irrAtoB<-matrix(c(0,0,1,0),2,2,dimnames=list(c("a","b"),c("a","b")))
fitAtoB<-fitMk(tree,x,model=irrAtoB)
irrBtoA<-t(irrAtoB)
fitBtoA<-fitMk(tree,x,model=irrBtoA)
fitBtoA
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## a b
## a 0.000000 0.000000
## b 1.247311 -1.247311
##
## Fitted (or set) value of pi:
## a b
## 0.5 0.5
##
## Log-likelihood: -60.893538
AIC(fitBtoA)
## [1] 123.7871
aic<-as.matrix(sapply(list(ER=fitER,ARD=fitARD,"a->b"=fitAtoB,"b->a"=fitBtoA),
AIC))
aic
## [,1]
## ER 113.1791
## ARD 110.1851
## a->b 108.1851
## b->a 123.7871
So we can see that indeed the generating model, a->b
, fits best.
That's it.
To duplicate any of the analyses above, you can also use the phytools
data object anoletree
as follows:
data(anoletree)
ecomorph<-getStates(anoletree,"tips")
anoletree<-read.tree(text=write.tree(anoletree))
No comments:
Post a Comment
Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.