Wednesday, November 8, 2017

Cleaner print & plot methods in a faster ancThresh function for ancestral character estimation under the threshold model

A few years ago I published a method for ancestral character estimation using the threshold model from evolutionary quantitative genetics. The threshold model originates with Wright, but was first applied to phylogenetic comparative analyses by Felsenstein (2005, 2012).

Unfortunately, though this method continues to be used, the associated phytools function is kind of ugly. It has no associated S3 methods (such as print or plot methods), and it is slow.

Recently, Revell lab postdoc Klaus Schliep helpsed with the latter of these two issues by rewriting a simple, but frequently used internal function threshState in C. This function (threshStateC) is in phangorn (not phytools) so that phytools can continue to be installed without compilation.

Today I also added nicer print & plot methods for the function. I also modified the object class exported by the function - from a simple list of the whole MCMC, so an object of class "ancThresh" - but one that still includes all the components of the previous list to help maintain backward compatibility with scripts written around earlier phytools versions.

Here is a simple demo using data for feeding mode (non-piscivorous, moderate piscivory, and high piscivory) in centrarchid fishes:

library(phytools)
## Loading required package: ape
## Loading required package: maps
packageVersion("phytools")
## [1] '0.6.39'
X<-read.csv("Centrarchidae.data.csv",header=TRUE,row.names=1)
tree<-read.tree("Centrarchidae.tre")
mcmc<-ancThresh(tree,X,ngen=10000000,control=list(print=FALSE))
## **** NOTE: no sequence provided, column order in x
## MCMC starting....
mcmc
## 
## Object containing the results from an MCMC analysis
## of the threshold model using ancThresh.
## 
## List with the following components:
## ace:  matrix with posterior probabilities assuming 2e+06 
##       burn-in generations.
## mcmc: posterior sample of liabilities at tips & internal
##       nodes (a matrix with 10001 rows & 31 columns).
## par:  posterior sample of the relative positions of the
##       thresholds, the log-likelihoods, and any other
##       model variables (a matrix with 10001 rows).
## 
## The MCMC was run under the following conditions:
##       seq = not <-> mod <-> high 
##       model = BM 
##       number of generations = 1e+07 
##       sample interval= 1000 
##       burn-in = 2e+06
plot(mcmc)

plot of chunk unnamed-chunk-1

Neat.

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.