I thought it might be useful to post a quick tutorial on how to do basic
model comparison for discrete character evolution using the phytools function
fitMk. fitMk actually borrows a lot of code from
the ape function ace, although it does not do marginal ancestral
state estimation but it does allow the user flexibility in specifying the
prior distribution π. This can theoretically be
important.
## first load packages
library(phytools)
For this exercise we will use the Anolis ecomorph data:
anoletree
##
## Phylogenetic tree with 82 tips and 81 internal nodes.
##
## Tip labels:
## Anolis_ahli, Anolis_allogus, Anolis_rubribarbus, Anolis_imias, Anolis_sagrei, Anolis_bremeri, ...
##
## Rooted; includes branch lengths.
ecomorph
## Anolis_ahli Anolis_allogus Anolis_rubribarbus
## "TG" "TG" "TG"
## Anolis_imias Anolis_sagrei Anolis_bremeri
## "TG" "TG" "TG"
## Anolis_quadriocellifer Anolis_ophiolepis Anolis_mestrei
## "TG" "GB" "TG"
## Anolis_jubar Anolis_homolechis Anolis_confusus
## "TG" "TG" "TG"
## Anolis_guafe Anolis_garmani Anolis_opalinus
## "TG" "CG" "TC"
## Anolis_grahami Anolis_valencienni Anolis_lineatopus
## "TC" "Tw" "TG"
## Anolis_evermanni Anolis_stratulus Anolis_krugi
## "TC" "TC" "GB"
## Anolis_pulchellus Anolis_gundlachi Anolis_poncensis
## "GB" "TG" "GB"
## Anolis_cooki Anolis_cristatellus Anolis_brevirostris
## "TG" "TG" "Tr"
## Anolis_caudalis Anolis_marron Anolis_websteri
## "Tr" "Tr" "Tr"
## Anolis_distichus Anolis_alumina Anolis_semilineatus
## "Tr" "GB" "GB"
## Anolis_olssoni Anolis_insolitus Anolis_whitemani
## "GB" "Tw" "TG"
## Anolis_haetianus Anolis_breslini Anolis_armouri
## "TG" "TG" "TG"
## Anolis_cybotes Anolis_shrevei Anolis_longitibialis
## "TG" "TG" "TG"
## Anolis_strahmi Anolis_marcanoi Anolis_baleatus
## "TG" "TG" "CG"
## Anolis_barahonae Anolis_ricordii Anolis_cuvieri
## "CG" "CG" "CG"
## Anolis_altitudinalis Anolis_oporinus Anolis_isolepis
## "TC" "TC" "TC"
## Anolis_allisoni Anolis_porcatus Anolis_loysiana
## "TC" "TC" "Tr"
## Anolis_guazuma Anolis_placidus Anolis_sheplani
## "Tw" "Tw" "Tw"
## Anolis_alayoni Anolis_angusticeps Anolis_paternus
## "Tw" "Tw" "Tw"
## Anolis_alutaceus Anolis_inexpectatus Anolis_clivicola
## "GB" "GB" "GB"
## Anolis_cupeyalensis Anolis_cyanopleurus Anolis_alfaroi
## "GB" "GB" "GB"
## Anolis_macilentus Anolis_vanidicus Anolis_baracoae
## "GB" "GB" "CG"
## Anolis_noblei Anolis_smallwoodi Anolis_luteogularis
## "CG" "CG" "CG"
## Anolis_equestris Anolis_bahorucoensis Anolis_dolichocephalus
## "CG" "GB" "GB"
## Anolis_hendersoni Anolis_darlingtoni Anolis_aliniger
## "GB" "Tw" "TC"
## Anolis_singularis Anolis_chlorocyanus Anolis_coelestinus
## "TC" "TC" "TC"
## Anolis_occultus
## "Tw"
We can fit some different models as follows:
## equal-rates model
fitER<-fitMk(anoletree,ecomorph,model="ER")
fitER
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## CG GB TC TG Tr Tw
## CG -0.115707 0.023141 0.023141 0.023141 0.023141 0.023141
## GB 0.023141 -0.115707 0.023141 0.023141 0.023141 0.023141
## TC 0.023141 0.023141 -0.115707 0.023141 0.023141 0.023141
## TG 0.023141 0.023141 0.023141 -0.115707 0.023141 0.023141
## Tr 0.023141 0.023141 0.023141 0.023141 -0.115707 0.023141
## Tw 0.023141 0.023141 0.023141 0.023141 0.023141 -0.115707
##
## Fitted (or set) value of pi:
## CG GB TC TG Tr Tw
## 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
##
## Log-likelihood: -79.837802
## symmetric model
fitSYM<-fitMk(anoletree,ecomorph,model="SYM")
fitSYM
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## CG GB TC TG Tr Tw
## CG -0.057272 0.024020 0.033252 0.000000 0.000000 0.000000
## GB 0.024020 -0.200333 0.042382 0.060093 0.013413 0.060426
## TC 0.033252 0.042382 -0.135863 0.000000 0.024276 0.035953
## TG 0.000000 0.060093 0.000000 -0.060093 0.000000 0.000000
## Tr 0.000000 0.013413 0.024276 0.000000 -0.037689 0.000000
## Tw 0.000000 0.060426 0.035953 0.000000 0.000000 -0.096379
##
## Fitted (or set) value of pi:
## CG GB TC TG Tr Tw
## 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
##
## Log-likelihood: -73.658443
## all-rates-different model
fitARD<-fitMk(anoletree,ecomorph,model="ARD")
## Warning in log(comp[1:M + N]): NaNs produced
## Warning in log(comp[1:M + N]): NaNs produced
fitARD
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## CG GB TC TG Tr Tw
## CG -0.425387 0.066978 0.093014 0.104651 0.026629 0.134115
## GB 0.000006 -0.000624 0.000592 0.000000 0.000000 0.000025
## TC 0.031442 0.000010 -0.070540 0.000008 0.038980 0.000100
## TG 0.000041 0.094322 0.000000 -0.095642 0.000562 0.000717
## Tr 0.000003 0.000032 0.036053 0.040923 -0.099574 0.022562
## Tw 0.000000 0.038455 0.043720 0.000000 0.000000 -0.082175
##
## Fitted (or set) value of pi:
## CG GB TC TG Tr Tw
## 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
##
## Log-likelihood: -68.328061
In addition to these basic models, we can also fit any arbitrary model. So, for instance, we could fit a model in which character evolution is ordered. In this case I will assume that ecomorphs can only evolve alphabetically - a totally ridiculous assumption, of course - but useful to demonstrate the way such models are set up:
states<-sort(unique(ecomorph))
ordered<-matrix(0,length(states),length(states),
dimnames=list(states,states))
for(i in 1:length(states))
if((i+1)<=length(states))
ordered[i,i+1]<-ordered[i+1,i]<-1
ordered
## CG GB TC TG Tr Tw
## CG 0 1 0 0 0 0
## GB 1 0 1 0 0 0
## TC 0 1 0 1 0 0
## TG 0 0 1 0 1 0
## Tr 0 0 0 1 0 1
## Tw 0 0 0 0 1 0
fitOrdered<-fitMk(anoletree,ecomorph,model=ordered)
Similarly, we could fit an ordered model but one in which the backward & forward rates are different (here indicated by the superdiagonal and subdiagonal elements, respectively):
asym.ordered<-matrix(0,length(states),length(states),
dimnames=list(states,states))
for(i in 1:length(states))
if((i+1)<=length(states)){
asym.ordered[i,i+1]<-1
asym.ordered[i+1,i]<-2
}
asym.ordered
## CG GB TC TG Tr Tw
## CG 0 1 0 0 0 0
## GB 2 0 1 0 0 0
## TC 0 2 0 1 0 0
## TG 0 0 2 0 1 0
## Tr 0 0 0 2 0 1
## Tw 0 0 0 0 2 0
fitAsym<-fitMk(anoletree,ecomorph,model=asym.ordered)
Note that the indices 1 and 2 have nothing
to do with the actual rates that are fit - they merely allow us to
tell R to fit (in this case) two different rates, and where those
different rates should be distribution in the matrix. In other words
a symmetric model can also be specified as follows:
sym.model<-matrix(0,length(states),length(states),
dimnames=list(states,states))
k<-1
for(i in 1:(length(states)-1)){
for(j in (i+1):length(states)){
print(c(i,j))
sym.model[i,j]<-sym.model[j,i]<-k
k<-k+1
}
}
## [1] 1 2
## [1] 1 3
## [1] 1 4
## [1] 1 5
## [1] 1 6
## [1] 2 3
## [1] 2 4
## [1] 2 5
## [1] 2 6
## [1] 3 4
## [1] 3 5
## [1] 3 6
## [1] 4 5
## [1] 4 6
## [1] 5 6
sym.model
## CG GB TC TG Tr Tw
## CG 0 1 2 3 4 5
## GB 1 0 6 7 8 9
## TC 2 6 0 10 11 12
## TG 3 7 10 0 13 14
## Tr 4 8 11 13 0 15
## Tw 5 9 12 14 15 0
fitSymModel<-fitMk(anoletree,ecomorph,model=sym.model)
fitSymModel
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## CG GB TC TG Tr Tw
## CG -0.057272 0.024020 0.033252 0.000000 0.000000 0.000000
## GB 0.024020 -0.200333 0.042382 0.060093 0.013413 0.060426
## TC 0.033252 0.042382 -0.135863 0.000000 0.024276 0.035953
## TG 0.000000 0.060093 0.000000 -0.060093 0.000000 0.000000
## Tr 0.000000 0.013413 0.024276 0.000000 -0.037689 0.000000
## Tw 0.000000 0.060426 0.035953 0.000000 0.000000 -0.096379
##
## Fitted (or set) value of pi:
## CG GB TC TG Tr Tw
## 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
##
## Log-likelihood: -73.658443
Finally, we can compare among models using a likelihood ratio, for nested models, or using AIC. Remember, lower AIC indicate better model fit penalizing for the number of parameters in the fitted model:
AIC(fitER) ## for instance
## [1] 161.6756
## or over all models
aic<-as.matrix(sapply(list(ER=fitER,SYM=fitSYM,ARD=fitARD,
ordered=fitOrdered,assymetric.ordered=fitAsym),AIC))
aic
## [,1]
## ER 161.6756
## SYM 177.3169
## ARD 196.6561
## ordered 202.0630
## assymetric.ordered 203.6836
Looks like the equal-rates model wins!
Let's try it with a simpler dataset simulated under an irreversible model:
set.seed(10)
tree<-pbtree(n=150,scale=1)
Q<-matrix(c(-1,1,1e-8,-1e-8),2,2)
rownames(Q)<-colnames(Q)<-letters[1:2]
x<-sim.history(tree,Q,anc="a")$states
## Note - the rate of substitution from i->j should be given by Q[j,i].
## Done simulation(s).
cols<-setNames(c("red","blue"),letters[1:2])
par(fg="transparent")
dotTree(tree,x,ftype="off",lwd=1,colors=cols,legend=FALSE)
par(fg="black")
add.simmap.legend(colors=cols,shape="circle",x=0,y=6,prompt=FALSE)
fitER<-fitMk(tree,x,model="ER")
fitARD<-fitMk(tree,x,model="ARD")
irrAtoB<-matrix(c(0,0,1,0),2,2,dimnames=list(c("a","b"),c("a","b")))
fitAtoB<-fitMk(tree,x,model=irrAtoB)
irrBtoA<-t(irrAtoB)
fitBtoA<-fitMk(tree,x,model=irrBtoA)
fitBtoA
## Object of class "fitMk".
##
## Fitted (or set) value of Q:
## a b
## a 0.000000 0.000000
## b 1.247311 -1.247311
##
## Fitted (or set) value of pi:
## a b
## 0.5 0.5
##
## Log-likelihood: -60.893538
AIC(fitBtoA)
## [1] 123.7871
aic<-as.matrix(sapply(list(ER=fitER,ARD=fitARD,"a->b"=fitAtoB,"b->a"=fitBtoA),
AIC))
aic
## [,1]
## ER 113.1791
## ARD 110.1851
## a->b 108.1851
## b->a 123.7871
So we can see that indeed the generating model, a->b, fits best.
That's it.
To duplicate any of the analyses above, you can also use the phytools
data object anoletree as follows:
data(anoletree)
ecomorph<-getStates(anoletree,"tips")
anoletree<-read.tree(text=write.tree(anoletree))
No comments:
Post a Comment
Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.