Sunday, October 14, 2012

Ancestral character estimation under the threshold model, part II

Another quick post about ancestral character estimation under the threshold model from quantitative genetics. For a reminder of the details of this model, please check out my prior posts on the topic, as well as Felsenstein (2012).

The central idea is that this model, in which the value for a discrete character is determined by the evolution of an underlying continuous trait called 'liability,' might be a more appropriate model for many discretely manifested, by complex or polygenic, attributes of organismal phenotype.

I'm estimating under this model by using Bayesian MCMC to sample ancestral and tip liabilities, and the thresholds that result in a change in the discrete character, from their joint posterior probability distribution.

I'm still working on this, but here is an example simulation, analysis, and then some assessments of performance:

> source("ancThresh.R")
> # simulate liabilities
> l<-fastBM(tree,internal=TRUE,sig2=2,a=-1)
> # translate liabilities to threshold character
> x<-rep("blue",length(l))
> x[l>0]<-"green"
> x[l>1]<-"red"
> x[l>6]<-"yellow" > names(x)<-names(l)
> x
    t54      t55      t51 ...
 "blue"   "blue"   "blue" ...
> x11(); res<-ancThresh(tree,x[1:length(tree$tip)],ngen=100000)
**** NOTE: no sequence provided, using alphabetical or numerical order
MCMC starting....
gen 1000
gen 2000
> # plot the likelihood profile
> plot(res$par[,"gen"],res$par[,"logLik"],xlab="gen", ylab="logL")
> lines(res$par[,"gen"],res$par[,"logLik"])
> # pull estimates as max PP for each node
> est<-apply(res$ace,1,function(x) names(x)[which(x==max(x))])
> est
    101      102      103 ...
  "red"   "blue"   "blue" ...
> # do they match?
> a<-x[101:199] # get simulated ancestors
> sum(a==est)/length(a)
[1] 0.7878788 # about 79% match
> # what if we restrict to >0.5, 0.7, or 0.9 PP?
> any50<-apply(res$ace,1,function(x) any(x>0.5))
> sum(any50)/tree$Nnode # gives us the % nodes with PP>0.5
[1] 0.969697
> # now the fraction that match, given PP>0.5
> sum(a[any50]==est[any50])/sum(any50)
[1] 0.7916667
> # PP > 0.7
> any70<-apply(res$ace,1,function(x) any(x>0.7))
> sum(any70)/tree$Nnode
[1] 0.5050505
> sum(a[any70]==est[any70])/sum(any70)
[1] 0.96
> # 96% of nodes with PP>0.7 are correct
> # PP > 0.9
> any90<-apply(res$ace,1,function(x) any(x>0.9))
> sum(any90)/tree$Nnode
[1] 0.2929293
> sum(a[any90]==est[any90])/sum(any90)
[1] 1

Cool. I will post the code & more on this very soon.


  1. Dear Liam,

    Is there a way to run threshBayes with a single discrete character?

    Also, I am relatively new to R so when I run threshBayes with two characters I do not know how to plot the results onto the phylogeny as you did above.

    I would be very greatful for any tips!


    1. Hi Ricardo.

      This cannot be done with threshBayes, which is specifically for a two character, correlated evolution model. The method described in this post is actually implemented in a different function, ancThresh, which I have just posted; but this code is "bleeding edge" and should be used only with caution.

      - Liam