Tuesday, December 4, 2012

Ancestral character estimation under the threshold model: an example

I just posted a new version of the phytools function ancThresh for ancestral character estimation using the threshold model from quantitative genetics. Under the threshold model evolution is of an unknown, and unobserved, continuous character, "liability." When liability exceeds a particular value (the threshold or one of multiple thresholds) the observed, discretely valued character trait changes state. (See the following link for much more discussion of the threshold model on this blog.)

The most significant change from prior versions of the function is that the present version can accept either a vector of length N containing a fixed set of trait values for the tips (as before) or a matrix of dimension N × m (for m conditions of the discrete character) in which the i,jth element contains the prior probability of tip i being in state j. This is not equivalent to modeling within-species polymorphism. Rather we allow the possibility that a tip state is unknown, or that it can be called in one condition or another with definable probability.

I have also added the function plotThresh. This function plots the posterior probabilities of ancestral states on the tree. It is called internally (by default) in ancThresh - but also can be called on its own and given the posterior sample from an MCMC run.

Here's a quick example using the phylogeny of Centrarchidae in Near et al. (2005) and the data for feeding mode (including uncertainty) from Collar et al. (2009). Note that for taxa missing from the Collar et al. study I have assigned states for feeding mode based on the behavior of close relatives - I could have equally well decided that feeding mode was unknown for those species.

> tree<-read.tree("Centrarchidae.tre")
> X<-read.csv("Centrarchidae.csv",header=T,row.names=1)
> X
                           not    mod   high
Acantharchus_pomotis     0.0000 1.0000 0.0000
Lepomis_gibbosus         1.0000 0.0000 0.0000
Lepomis_microlophus      1.0000 0.0000 0.0000
Lepomis_punctatus        1.0000 0.0000 0.0000
Lepomis_miniatus         0.5000 0.5000 0.0000
Ambloplites_ariommus     0.3333 0.3333 0.3333
> mcmc<-ancThresh(tree,X,ngen=1000000,control= list(piecol=cols,tipcol="estimated"),label.offset=0.01)
**** NOTE: no sequence provided, column order in x
MCMC starting....
gen 1000
gen 2000
> layout(c(1,2),heights=c(0.9,0.1))
> plotThresh(tree,X,mcmc,piecol=cols,tipcol="estimated", label.offset=0.01)
> plot.new()
> plot.window(xlim=c(0,10),ylim=c(-2,1.25))
> points(0.2,1,pch=19,col="green",cex=2)
> text(0.2,1,"non-piscivorous",pos=4,offset=0.7)
> points(0.2,0,pch=19,col="blue",cex=2)
> text(0.2,0,"moderately piscivorous",pos=4,offset=0.7)
> points(0.2,-1,pch=19,col="red",cex=2)
> text(0.2,-1,"highly piscivorous",pos=4,offset=0.7)

Note that plotThresh(...,tipcol="estimated") uses the sampled liabilities for tips from the MCMC to obtain posterior probabilities for the trait values of uncertain tips. I could also using plotThresh(...,tipcol="input") to get back the input probabilities.

Well, that's pretty cool. Since there are whole bunch of helper functions that are called internally by ancThresh and plotThresh, I would strongly advise installing the latest minor phytools build (phytools 0.2-14) - rather than loading the source code directly - if you are interested in repeating the analyses of this post. Word of caution (!) - ancThresh is still in very early development (and should be treated as such).


  1. Dear Dr. Revell,

    Thank you very much for posting this new version of the ancThresh. I have tried to run it, but i get an error message:

    Error en sort.list(y) : 'x' must be atomic for 'sort.list'
    Have you called 'sort' on a list?

    I am trying to fix the problem, although i have followed (I guess) your recommendations above. My character has six states, for 36 spp. Too many?

    Thank you very much

    1. Hi Jose.

      Not sure what that error means. Do you provide a sequence of the discrete trait on the liability axis? E.g., ancThresh(...,sequence=c("A","B","C","D")).

      It is not likely that the limited data is causing your error - but it seems possible that this will be a problem if you successfully run the analysis.

      - Liam

  2. Hi Liam,

    Thanks for your prompt response. I already fixed the problem, it was my fault. I was mixing up your new function with the old codes from phytools.

    José María

    1. Glad to here it. Thanks for the update. - Liam