Wednesday, June 3, 2020

Update #3 in phytools 0.7-47: New object class & S3 methods for printing & plotting the Q matrix in a fitted Mk model

As I commented in two earlier posts (1, 2), yesterday a new version of phytools (version 0.7-47) became available on CRAN.

Since I've been doing lots of work on phytools, but haven't been keeping up to speed with that work on this blog, I thought it might be useful to hammer out a few posts outlining some of the updated features of this new phytools version compared to the previous CRAN release. So far I have commented on updates to fit.bd (for fitting a birth-death model to a reconstructed tree), and in threshBayes (for estimating the evolutionary correlation between traits under a threshold model).

One kind of last-minute update that I made in collaboration with Luke Harmom via his geiger package (which also has a new CRAN version today) was the introduction of a new object class ("Qmatrix"), object conversion method (as.Qmatrix), and S3 print and plot methods. as.Qmatrix takes a fitted Mk discrete character evolution model (e.g., from fitMk or geiger::fitDiscrete) and then converts it to an object of class "Qmatrix" which is really just (you guessed it) a matrix with the class attributed "Qmatrix". I also wrote a generic plot method for the object class, which now lies inside of the previous plot.fitMk and plot.gfit (for geiger) plotting methods.

Here's a quick demo using the Anolis ecomorph phylogeny & data.

packageVersion("phytools")
## [1] '0.7.47'
packageVersion("geiger")
## [1] '2.0.7'
library(phytools)
library(geiger)
data(anoletree)
anoletree
## 
## Phylogenetic tree with 82 tips and 81 internal nodes.
## 
## Tip labels:
##  ahli, allogus, rubribarbus, imias, sagrei, bremeri, ...
## 
## The tree includes a mapped, 6-state discrete character with states:
##  CG, GB, TC, TG, Tr, Tw
## 
## Rooted; includes branch lengths.

Our tree is a "simmap" object, but we actually just want a plain old "phylo" tree, plus the tip states which we will use as our discrete character.

ecomorph<-getStates(anoletree,"tips")
ecomorph
##            ahli         allogus     rubribarbus           imias          sagrei 
##            "TG"            "TG"            "TG"            "TG"            "TG" 
##         bremeri quadriocellifer      ophiolepis         mestrei           jubar 
##            "TG"            "TG"            "GB"            "TG"            "TG" 
##      homolechis        confusus           guafe         garmani        opalinus 
##            "TG"            "TG"            "TG"            "CG"            "TC" 
##         grahami     valencienni      lineatopus       evermanni       stratulus 
##            "TC"            "Tw"            "TG"            "TC"            "TC" 
##           krugi      pulchellus       gundlachi       poncensis           cooki 
##            "GB"            "GB"            "TG"            "GB"            "TG" 
##    cristatellus    brevirostris        caudalis          marron        websteri 
##            "TG"            "Tr"            "Tr"            "Tr"            "Tr" 
##       distichus         alumina    semilineatus         olssoni       insolitus 
##            "Tr"            "GB"            "GB"            "GB"            "Tw" 
##       whitemani       haetianus        breslini         armouri         cybotes 
##            "TG"            "TG"            "TG"            "TG"            "TG" 
##         shrevei   longitibialis         strahmi        marcanoi        baleatus 
##            "TG"            "TG"            "TG"            "TG"            "CG" 
##       barahonae        ricordii         cuvieri   altitudinalis        oporinus 
##            "CG"            "CG"            "CG"            "TC"            "TC" 
##        isolepis        allisoni        porcatus        loysiana         guazuma 
##            "TC"            "TC"            "TC"            "Tr"            "Tw" 
##        placidus        sheplani         alayoni     angusticeps        paternus 
##            "Tw"            "Tw"            "Tw"            "Tw"            "Tw" 
##       alutaceus    inexpectatus       clivicola    cupeyalensis    cyanopleurus 
##            "GB"            "GB"            "GB"            "GB"            "GB" 
##         alfaroi      macilentus       vanidicus        baracoae          noblei 
##            "GB"            "GB"            "GB"            "CG"            "CG" 
##      smallwoodi    luteogularis       equestris   bahorucoensis dolichocephalus 
##            "CG"            "CG"            "CG"            "GB"            "GB" 
##      hendersoni     darlingtoni        aliniger      singularis    chlorocyanus 
##            "GB"            "Tw"            "TC"            "TC"            "TC" 
##     coelestinus        occultus 
##            "TC"            "Tw"
anoletree<-as.phylo(anoletree)
anoletree
## 
## Phylogenetic tree with 82 tips and 81 internal nodes.
## 
## Tip labels:
##  ahli, allogus, rubribarbus, imias, sagrei, bremeri, ...
## 
## Rooted; includes branch lengths.
plotTree(anoletree,offset=3,ftype="i",
    fsize=0.8,lwd=1,type="fan")
cols<-setNames(c("green","#E4D96F","darkgreen",
    "brown","black","darkgrey"),
    c("CG","GB","TC","TG","Tr","Tw"))
tiplabels(pie=to.matrix(ecomorph,
    c("CG","GB","TC","TG","Tr","Tw")),
    piecol=cols,cex=0.5)
legend("topleft",
    c("CG","GB","TC","TG","Tr","Tw"),
    pch=21,pt.cex=2,pt.bg=cols)

plot of chunk unnamed-chunk-2

Now we're pretty much ready to fit our model, which I will do with fitMk.

fitARD<-fitMk(anoletree,ecomorph,model="ARD")
print(fitARD,digits=4)
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##        CG     GB      TC      TG     Tr     Tw
## CG 0.0000 0.0000  0.0000  0.0000 0.0000  0.000
## GB 0.0000 0.0000  0.0000  0.0000 0.0000  0.000
## TC 0.0460 0.0000 -0.0768  0.0000 0.0308  0.000
## TG 0.0000 0.0535  0.0000 -0.0535 0.0000  0.000
## Tr 0.0000 0.0000  0.0000  0.0000 0.0000  0.000
## Tw 0.0303 0.0576  0.0876  0.0820 0.0195 -0.277
## 
## Fitted (or set) value of pi:
##        CG        GB        TC        TG        Tr        Tw 
## 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 
## 
## Log-likelihood: -67.3937 
## 
## Optimization method used was "nlminb"

Now, to pull out the Q matrix let's use as.Qmatrix:

as.Qmatrix(fitARD)
## Estimated Q matrix:
##            CG         GB          TC          TG         Tr        Tw
## CG 0.00000000 0.00000000  0.00000000  0.00000000 0.00000000  0.000000
## GB 0.00000000 0.00000000  0.00000000  0.00000000 0.00000000  0.000000
## TC 0.04603746 0.00000000 -0.07682257  0.00000000 0.03078511  0.000000
## TG 0.00000000 0.05354003  0.00000000 -0.05354003 0.00000000  0.000000
## Tr 0.00000000 0.00000000  0.00000000  0.00000000 0.00000000  0.000000
## Tw 0.03029880 0.05761600  0.08755621  0.08202421 0.01953679 -0.277032

Finally, let's go ahead & plot this fitted model:

plot(as.Qmatrix(fitARD),show.zeros=FALSE)
title(main="fitMk ARD model\nfor Anolis ecomorph data",
    font.main=3,line=-0.5)

plot of chunk unnamed-chunk-5

Finally, here's a fix that I just thought of & is not in the CRAN phytools version (but may be in a future edition*). [*Note: this update is already on GitHub.]

xy<-plot(as.Qmatrix(fitARD),show.zeros=FALSE)
library(plotrix)
nulo<-mapply(draw.circle,xy$x,xy$y,col=cols,
    MoreArgs=list(radius=0.08))
text(xy$x,xy$y,xy$states,cex=1.2,col="white")
legend("topleft",xy$states,
    pch=21,pt.cex=2.5,pt.bg=cols)
title(main="fitMk ARD model\nfor Anolis ecomorph data",
    font.main=3,line=-0.5)

plot of chunk unnamed-chunk-6

Ooh. That's cool.

We can do the same with geiger::fitDiscrete - although we need to keep in mind that fitDiscrete doesn't fit the same model by default, since it makes a different assumption about the root state.

fitARD.geiger<-fitDiscrete(anoletree,ecomorph,
    model="ARD",control=list(niter=200))
print(fitARD.geiger,digits=4)
## GEIGER-fitted comparative model of discrete data
##  fitted Q matrix:
##                CG         GB         TC         TG         Tr         Tw
##     CG -1.918e-15  3.385e-22  1.211e-15  6.072e-16  9.914e-17  1.383e-20
##     GB  1.686e-88 -3.008e-16  1.832e-17  2.822e-16  3.483e-19  3.704e-35
##     TC  4.603e-02  3.826e-17 -7.682e-02  2.376e-16  3.079e-02  3.992e-17
##     TG  2.680e-29  5.354e-02  2.017e-23 -5.354e-02  3.830e-17  3.492e-17
##     Tr  5.471e-16  1.104e-31  3.751e-17  4.986e-16 -1.086e-15  2.906e-18
##     Tw  3.030e-02  5.762e-02  8.756e-02  8.202e-02  1.954e-02 -2.770e-01
## 
##  model summary:
##  log-likelihood = -65.601973
##  AIC = 191.203945
##  AICc = 227.674534
##  free parameters = 30
## 
## Convergence diagnostics:
##  optimization iterations = 200
##  failed iterations = 0
##  number of iterations with same best fit = 4
##  frequency of best fit = 0.02
## 
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates
as.Qmatrix(fitARD.geiger)
## Estimated Q matrix:
##               CG            GB            TC            TG            Tr            Tw
## CG -1.917627e-15  3.385247e-22  1.211323e-15  6.071542e-16  9.913558e-17  1.382745e-20
## GB  1.686377e-88 -3.008306e-16  1.831573e-17  2.821666e-16  3.482602e-19  3.703619e-35
## TC  4.603417e-02  3.826475e-17 -7.682342e-02  2.376160e-16  3.078925e-02  3.992021e-17
## TG  2.680183e-29  5.354024e-02  2.016519e-23 -5.354024e-02  3.829973e-17  3.492387e-17
## Tr  5.470938e-16  1.104037e-31  3.750727e-17  4.985806e-16 -1.086088e-15  2.906186e-18
## Tw  3.029966e-02  5.761593e-02  8.755644e-02  8.202408e-02  1.953551e-02 -2.770316e-01
xy<-plot(as.Qmatrix(fitARD.geiger),
    show.zeros=FALSE)
nulo<-mapply(draw.circle,xy$x,xy$y,col=cols,
    MoreArgs=list(radius=0.08))
text(xy$x,xy$y,xy$states,cex=1.2,col="white")
legend("topleft",xy$states,
    pch=21,pt.cex=2.5,pt.bg=cols)
title(main=paste("fitDiscrete ARD model\n",
    "for Anolis ecomorph data"),
    font.main=3,line=-0.5)

plot of chunk unnamed-chunk-7

Even though the two models make different assumptions, their estimates (in this particular case) are almost exactly the same. (We need to be careful, though, because this is not always going to be the true!) We can see that by plotting one against the other:

design<-(lower.tri(as.Qmatrix(fitARD))+
    upper.tri(as.Qmatrix(fitARD)))>0 ## this is a bit hacky
plot.default(as.Qmatrix(fitARD)[design],
    as.Qmatrix(fitARD.geiger)[design],bty="l",pch=19,
    col=make.transparent("blue",0.5),cex=1.5,
    asp=1,xlab="fitMk estimates",
    ylab="fitDiscrete estimates")
lines(par()$usr[1:2],par()$usr[1:2],lty="dotted")

plot of chunk unnamed-chunk-8

Cool.

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.