Wednesday, June 1, 2016

plot method for geiger's fitDiscrete function

As implicitly promised in a post yesterday, I have now added a plot method for fitted Mk models from the geiger function fitDiscrete. I felt like this was a good idea because I believe the geiger implementation to be more robust (that is, it uses a more rigorous optimization process) than does fitMk in my package. fitMk, by the way, recycles code extensively from ace in the ape package.

What I elected to do was to first pull out all the relevant attributes of the fitted model from the geiger object (an object of class "gfit" resulting from a call to "fitDiscrete"), then use these values to create a new object of class "fitMk" internally, then just call phytools S3 method on that object.

Here is what that wrapper code looks like. Note that an object of class "gfit" in geiger can be from a call to fitDiscrete or fitContinuous, so we have to check for that first.

## S3 plot method for objects resulting from fitDiscrete
plot.gfit<-function(x,...){
    if("mkn"%in%class(x$lik)==FALSE){
        stop("Sorry. No plot method presently available for objects of this type.")
    } else {
        obj<-list()
        QQ<-.Qmatrix.from.gfit(x)
        obj$states<-colnames(QQ)
        m<-length(obj$states)
        obj$index.matrix<-matrix(NA,m,m)
        k<-m*(m-1)
        obj$index.matrix[col(obj$index.matrix)!=row(obj$index.matrix)]<-1:k
        obj$rates<-QQ[sapply(1:k,function(x,y) which(x==y),obj$index.matrix)]
        class(obj)<-"fitMk"
        plot(obj,...)
    }
}

Now let's take our data & tree and fit a model:

library(phytools)
library(geiger)
tree
## 
## Phylogenetic tree with 100 tips and 99 internal nodes.
## 
## Tip labels:
##  t67, t68, t76, t78, t79, t74, ...
## 
## Rooted; includes branch lengths.
x
##    t67    t68    t76    t78    t79    t74    t75    t46     t9    t26 
## purple  green    red    red   blue purple  green purple    red purple 
##    t62    t63    t69    t80    t81    t93    t94    t48    t49    t39 
## purple  green  green   blue   blue purple purple  green    red  green 
##    t51    t52    t35    t19     t4    t91    t92    t64    t65    t30 
## purple  green    red purple   blue   blue   blue   blue   blue    red 
##     t8     t2    t72    t86    t87    t66    t88    t89    t29    t10 
##    red   blue    red   blue purple purple   blue  green    red    red 
##    t11    t57    t58    t42    t43    t24    t25    t40    t41    t15 
## purple   blue   blue  green  green purple  green    red   blue   blue 
##    t37    t38    t28    t31    t32    t18    t16    t55    t56    t36 
##   blue  green purple  green   blue   blue   blue   blue   blue purple 
##     t1     t7    t70    t71    t12    t22    t23    t77    t84    t85 
## purple    red purple purple    red    red    red   blue   blue   blue 
##    t73    t95    t96    t50    t47    t27    t53    t54    t97    t98 
##   blue    red    red   blue  green   blue  green   blue  green  green 
##    t59    t90    t99   t100    t17     t3    t33    t34    t44    t45 
## purple    red    red    red  green   blue purple   blue    red purple 
##    t60    t61     t5     t6    t13    t14    t20    t21    t82    t83 
##  green    red   blue  green purple   blue    red   blue  green  green 
## Levels: green blue purple red
fit.phytools<-fitMk(tree,x,model="ARD")
plot(fit.phytools,main="fitMk model=\"ARD\"",show.zeros=FALSE,
    cex.traits=0.9)

plot of chunk unnamed-chunk-2

fit.geiger<-fitDiscrete(tree,x,model="ARD")
plot(fit.geiger,main="fitDiscrete model=\"ARD\"",show.zeros=FALSE,
    cex.traits=0.9)

plot of chunk unnamed-chunk-2

Both fitted models are basically identical. Although the configuration of the states in the plotting space are different, it is easy to see that the estimated rates are the same. In other cases, fitDiscrete and fitMk may differ, either because one or the other failed to converge on the MLE of Q, or because the two implementations make slightly different assumptions about the root.

FYI, the generating model in this case was as follows:

print(t(Q))
##        green blue purple red
## green     -2    1      1   0
## blue       1   -3      1   1
## purple     1    1     -3   1
## red        0    1      1  -2

so we are pretty close, particularly in that the two transition classes with their rates set to zero are the only ones with estimated rates of zero in the final fitted model.

Note that the function above uses the geiger internal function (borrowed by phytools) .Qmatrix.from.gfit, so to work that function will need to be loaded into the name space:

.Qmatrix.from.gfit<-phytools:::.Qmatrix.from.gfit

Finally, here's how the data were simulated:

tree<-pbtree(n=100)
Q<-matrix(c(-2,1,1,0,
    1,-3,1,1,
    1,1,-3,1,
    0,1,1,-2),4,4)
rownames(Q)<-colnames(Q)<-c("green","blue","purple","red")
x<-factor(sim.history(tree,Q)$states,
    levels=c("green","blue","purple","red"))

No comments:

Post a Comment