Saturday, March 26, 2016

Comparing alternative models for discrete character evolution using fitMk

I thought it might be useful to post a quick tutorial on how to do basic model comparison for discrete character evolution using the phytools function fitMk. fitMk actually borrows a lot of code from the ape function ace, although it does not do marginal ancestral state estimation but it does allow the user flexibility in specifying the prior distribution π. This can theoretically be important.

## first load packages
library(phytools)

For this exercise we will use the Anolis ecomorph data:

anoletree
## 
## Phylogenetic tree with 82 tips and 81 internal nodes.
## 
## Tip labels:
##  Anolis_ahli, Anolis_allogus, Anolis_rubribarbus, Anolis_imias, Anolis_sagrei, Anolis_bremeri, ...
## 
## Rooted; includes branch lengths.
ecomorph
##            Anolis_ahli         Anolis_allogus     Anolis_rubribarbus 
##                   "TG"                   "TG"                   "TG" 
##           Anolis_imias          Anolis_sagrei         Anolis_bremeri 
##                   "TG"                   "TG"                   "TG" 
## Anolis_quadriocellifer      Anolis_ophiolepis         Anolis_mestrei 
##                   "TG"                   "GB"                   "TG" 
##           Anolis_jubar      Anolis_homolechis        Anolis_confusus 
##                   "TG"                   "TG"                   "TG" 
##           Anolis_guafe         Anolis_garmani        Anolis_opalinus 
##                   "TG"                   "CG"                   "TC" 
##         Anolis_grahami     Anolis_valencienni      Anolis_lineatopus 
##                   "TC"                   "Tw"                   "TG" 
##       Anolis_evermanni       Anolis_stratulus           Anolis_krugi 
##                   "TC"                   "TC"                   "GB" 
##      Anolis_pulchellus       Anolis_gundlachi       Anolis_poncensis 
##                   "GB"                   "TG"                   "GB" 
##           Anolis_cooki    Anolis_cristatellus    Anolis_brevirostris 
##                   "TG"                   "TG"                   "Tr" 
##        Anolis_caudalis          Anolis_marron        Anolis_websteri 
##                   "Tr"                   "Tr"                   "Tr" 
##       Anolis_distichus         Anolis_alumina    Anolis_semilineatus 
##                   "Tr"                   "GB"                   "GB" 
##         Anolis_olssoni       Anolis_insolitus       Anolis_whitemani 
##                   "GB"                   "Tw"                   "TG" 
##       Anolis_haetianus        Anolis_breslini         Anolis_armouri 
##                   "TG"                   "TG"                   "TG" 
##         Anolis_cybotes         Anolis_shrevei   Anolis_longitibialis 
##                   "TG"                   "TG"                   "TG" 
##         Anolis_strahmi        Anolis_marcanoi        Anolis_baleatus 
##                   "TG"                   "TG"                   "CG" 
##       Anolis_barahonae        Anolis_ricordii         Anolis_cuvieri 
##                   "CG"                   "CG"                   "CG" 
##   Anolis_altitudinalis        Anolis_oporinus        Anolis_isolepis 
##                   "TC"                   "TC"                   "TC" 
##        Anolis_allisoni        Anolis_porcatus        Anolis_loysiana 
##                   "TC"                   "TC"                   "Tr" 
##         Anolis_guazuma        Anolis_placidus        Anolis_sheplani 
##                   "Tw"                   "Tw"                   "Tw" 
##         Anolis_alayoni     Anolis_angusticeps        Anolis_paternus 
##                   "Tw"                   "Tw"                   "Tw" 
##       Anolis_alutaceus    Anolis_inexpectatus       Anolis_clivicola 
##                   "GB"                   "GB"                   "GB" 
##    Anolis_cupeyalensis    Anolis_cyanopleurus         Anolis_alfaroi 
##                   "GB"                   "GB"                   "GB" 
##      Anolis_macilentus       Anolis_vanidicus        Anolis_baracoae 
##                   "GB"                   "GB"                   "CG" 
##          Anolis_noblei      Anolis_smallwoodi    Anolis_luteogularis 
##                   "CG"                   "CG"                   "CG" 
##       Anolis_equestris   Anolis_bahorucoensis Anolis_dolichocephalus 
##                   "CG"                   "GB"                   "GB" 
##      Anolis_hendersoni     Anolis_darlingtoni        Anolis_aliniger 
##                   "GB"                   "Tw"                   "TC" 
##      Anolis_singularis    Anolis_chlorocyanus     Anolis_coelestinus 
##                   "TC"                   "TC"                   "TC" 
##        Anolis_occultus 
##                   "Tw"

We can fit some different models as follows:

## equal-rates model
fitER<-fitMk(anoletree,ecomorph,model="ER")
fitER
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##           CG        GB        TC        TG        Tr        Tw
## CG -0.115707  0.023141  0.023141  0.023141  0.023141  0.023141
## GB  0.023141 -0.115707  0.023141  0.023141  0.023141  0.023141
## TC  0.023141  0.023141 -0.115707  0.023141  0.023141  0.023141
## TG  0.023141  0.023141  0.023141 -0.115707  0.023141  0.023141
## Tr  0.023141  0.023141  0.023141  0.023141 -0.115707  0.023141
## Tw  0.023141  0.023141  0.023141  0.023141  0.023141 -0.115707
## 
## Fitted (or set) value of pi:
##        CG        GB        TC        TG        Tr        Tw 
## 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 
## 
## Log-likelihood: -79.837802
## symmetric model
fitSYM<-fitMk(anoletree,ecomorph,model="SYM")
fitSYM
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##           CG        GB        TC        TG        Tr        Tw
## CG -0.057272  0.024020  0.033252  0.000000  0.000000  0.000000
## GB  0.024020 -0.200333  0.042382  0.060093  0.013413  0.060426
## TC  0.033252  0.042382 -0.135863  0.000000  0.024276  0.035953
## TG  0.000000  0.060093  0.000000 -0.060093  0.000000  0.000000
## Tr  0.000000  0.013413  0.024276  0.000000 -0.037689  0.000000
## Tw  0.000000  0.060426  0.035953  0.000000  0.000000 -0.096379
## 
## Fitted (or set) value of pi:
##        CG        GB        TC        TG        Tr        Tw 
## 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 
## 
## Log-likelihood: -73.658443
## all-rates-different model
fitARD<-fitMk(anoletree,ecomorph,model="ARD")
## Warning in log(comp[1:M + N]): NaNs produced

## Warning in log(comp[1:M + N]): NaNs produced

fitARD
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##           CG        GB        TC        TG        Tr        Tw
## CG -0.425387  0.066978  0.093014  0.104651  0.026629  0.134115
## GB  0.000006 -0.000624  0.000592  0.000000  0.000000  0.000025
## TC  0.031442  0.000010 -0.070540  0.000008  0.038980  0.000100
## TG  0.000041  0.094322  0.000000 -0.095642  0.000562  0.000717
## Tr  0.000003  0.000032  0.036053  0.040923 -0.099574  0.022562
## Tw  0.000000  0.038455  0.043720  0.000000  0.000000 -0.082175
## 
## Fitted (or set) value of pi:
##        CG        GB        TC        TG        Tr        Tw 
## 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 
## 
## Log-likelihood: -68.328061

In addition to these basic models, we can also fit any arbitrary model. So, for instance, we could fit a model in which character evolution is ordered. In this case I will assume that ecomorphs can only evolve alphabetically - a totally ridiculous assumption, of course - but useful to demonstrate the way such models are set up:

states<-sort(unique(ecomorph))
ordered<-matrix(0,length(states),length(states),
    dimnames=list(states,states))
for(i in 1:length(states))
    if((i+1)<=length(states)) 
        ordered[i,i+1]<-ordered[i+1,i]<-1
ordered
##    CG GB TC TG Tr Tw
## CG  0  1  0  0  0  0
## GB  1  0  1  0  0  0
## TC  0  1  0  1  0  0
## TG  0  0  1  0  1  0
## Tr  0  0  0  1  0  1
## Tw  0  0  0  0  1  0
fitOrdered<-fitMk(anoletree,ecomorph,model=ordered)

Similarly, we could fit an ordered model but one in which the backward & forward rates are different (here indicated by the superdiagonal and subdiagonal elements, respectively):

asym.ordered<-matrix(0,length(states),length(states),
    dimnames=list(states,states))
for(i in 1:length(states))
    if((i+1)<=length(states)){
        asym.ordered[i,i+1]<-1
        asym.ordered[i+1,i]<-2
    }
asym.ordered
##    CG GB TC TG Tr Tw
## CG  0  1  0  0  0  0
## GB  2  0  1  0  0  0
## TC  0  2  0  1  0  0
## TG  0  0  2  0  1  0
## Tr  0  0  0  2  0  1
## Tw  0  0  0  0  2  0
fitAsym<-fitMk(anoletree,ecomorph,model=asym.ordered)

Note that the indices 1 and 2 have nothing to do with the actual rates that are fit - they merely allow us to tell R to fit (in this case) two different rates, and where those different rates should be distribution in the matrix. In other words a symmetric model can also be specified as follows:

sym.model<-matrix(0,length(states),length(states),
    dimnames=list(states,states))
k<-1
for(i in 1:(length(states)-1)){
    for(j in (i+1):length(states)){
        print(c(i,j))
        sym.model[i,j]<-sym.model[j,i]<-k
        k<-k+1
    }
}
## [1] 1 2
## [1] 1 3
## [1] 1 4
## [1] 1 5
## [1] 1 6
## [1] 2 3
## [1] 2 4
## [1] 2 5
## [1] 2 6
## [1] 3 4
## [1] 3 5
## [1] 3 6
## [1] 4 5
## [1] 4 6
## [1] 5 6
sym.model
##    CG GB TC TG Tr Tw
## CG  0  1  2  3  4  5
## GB  1  0  6  7  8  9
## TC  2  6  0 10 11 12
## TG  3  7 10  0 13 14
## Tr  4  8 11 13  0 15
## Tw  5  9 12 14 15  0
fitSymModel<-fitMk(anoletree,ecomorph,model=sym.model)
fitSymModel
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##           CG        GB        TC        TG        Tr        Tw
## CG -0.057272  0.024020  0.033252  0.000000  0.000000  0.000000
## GB  0.024020 -0.200333  0.042382  0.060093  0.013413  0.060426
## TC  0.033252  0.042382 -0.135863  0.000000  0.024276  0.035953
## TG  0.000000  0.060093  0.000000 -0.060093  0.000000  0.000000
## Tr  0.000000  0.013413  0.024276  0.000000 -0.037689  0.000000
## Tw  0.000000  0.060426  0.035953  0.000000  0.000000 -0.096379
## 
## Fitted (or set) value of pi:
##        CG        GB        TC        TG        Tr        Tw 
## 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 
## 
## Log-likelihood: -73.658443

Finally, we can compare among models using a likelihood ratio, for nested models, or using AIC. Remember, lower AIC indicate better model fit penalizing for the number of parameters in the fitted model:

AIC(fitER) ## for instance
## [1] 161.6756
## or over all models
aic<-as.matrix(sapply(list(ER=fitER,SYM=fitSYM,ARD=fitARD,
    ordered=fitOrdered,assymetric.ordered=fitAsym),AIC))
aic
##                        [,1]
## ER                 161.6756
## SYM                177.3169
## ARD                196.6561
## ordered            202.0630
## assymetric.ordered 203.6836

Looks like the equal-rates model wins!

Let's try it with a simpler dataset simulated under an irreversible model:

set.seed(10)
tree<-pbtree(n=150,scale=1)
Q<-matrix(c(-1,1,1e-8,-1e-8),2,2)
rownames(Q)<-colnames(Q)<-letters[1:2]
x<-sim.history(tree,Q,anc="a")$states
## Note - the rate of substitution from i->j should be given by Q[j,i].
## Done simulation(s).
cols<-setNames(c("red","blue"),letters[1:2])
par(fg="transparent")
dotTree(tree,x,ftype="off",lwd=1,colors=cols,legend=FALSE)
par(fg="black")
add.simmap.legend(colors=cols,shape="circle",x=0,y=6,prompt=FALSE)

plot of chunk unnamed-chunk-8

fitER<-fitMk(tree,x,model="ER")
fitARD<-fitMk(tree,x,model="ARD")
irrAtoB<-matrix(c(0,0,1,0),2,2,dimnames=list(c("a","b"),c("a","b")))
fitAtoB<-fitMk(tree,x,model=irrAtoB)
irrBtoA<-t(irrAtoB)
fitBtoA<-fitMk(tree,x,model=irrBtoA)
fitBtoA
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##          a         b
## a 0.000000  0.000000
## b 1.247311 -1.247311
## 
## Fitted (or set) value of pi:
##   a   b 
## 0.5 0.5 
## 
## Log-likelihood: -60.893538
AIC(fitBtoA)
## [1] 123.7871
aic<-as.matrix(sapply(list(ER=fitER,ARD=fitARD,"a->b"=fitAtoB,"b->a"=fitBtoA),
    AIC))
aic
##          [,1]
## ER   113.1791
## ARD  110.1851
## a->b 108.1851
## b->a 123.7871

So we can see that indeed the generating model, a->b, fits best.

That's it.

To duplicate any of the analyses above, you can also use the phytools data object anoletree as follows:

data(anoletree)
ecomorph<-getStates(anoletree,"tips")
anoletree<-read.tree(text=write.tree(anoletree))

No comments:

Post a Comment