## Monday, October 15, 2012

### Posterior density for the thresholds during ancestral character estimation using the threshold model

I've been doing more exploration of ancestral character estimation using the threshold model. One thing that I was interested in was how good we can be at estimating the relative positions of the thresholds along our liability axis. Note that since unobserved liabilities have arbitrary scale, only the relative positions of the thresholds have meaning. Specifically, the set of thresholds [1,2,4] with Brownian rate of evolution of the liability of σ2=1.0 has the exact same meaning & interpretation of thresholds [0,2,6] and σ2=2.0.

Nonetheless, since I have simulated liability evolution with a fixed, known σ2 (and it is the same σ2 that I will use for estimation), it makes sense in this rare case to look at the posterior density for the thresholds on its raw scale. I decided to simulate using a four state (thus 3 threshold) model, in which the four states are "blue", "green", "red", and "yellow"; and the threshold transition points between colors are 0 (blue to green), 1 (green to red) and 4 (red to yellow).

Here was my simulation:
> source("ancThresh.R")
> # simulate tree
> tree<-pbtree(n=100,scale=10)
> # simulate liability
> l<-fastBM(tree,internal=TRUE,a=2)
> # translate to discrete character
> x<-rep("blue",length(l))
> x[l>0]<-"green"
> x[l>1]<-"red"
> x[l>4]<-"yellow"
> names(x)<-names(l)
> sequence<-c("blue","green","red","yellow")
> cols<-sequence; names(cols)<-sequence
> res<-ancThresh(tree,x[1:length(tree\$tip)],seq=sequence, ngen=1000000,control=list(piecol=cols))
MCMC starting....
gen 1000
gen 2000
...

The posterior sample for the thresholds is in res\$par. Unlike in the translation, above, each threshold is named for its upper bound (so, blue = 0; green = 1; red = 4; and yellow = Inf, in this case). Obviously, and again because liabilities are unscaled and uncentered, we fix the lowest of these at a constant value - we choose 0, but this is arbitrary.

Now let's do some computations to plot the posterior density:
> breaks<--0.25:25/5
> breaks
[1] -0.05  0.15  0.35  0.55  0.75  0.95  ...
> green<-hist(res\$par[,"green"],breaks=breaks)
> red<-hist(res\$par[,"red"],breaks=breaks)
> plot(red\$mids,red\$density,type="s",ylim=c(0, max(c(red\$density,green\$density))),xlab="liability", ylab="density",col="red")
> lines(red\$mids,green\$density,type="s",col="green")

I have also added the generating threshold (colored on each side with the appropriate discrete state) for fun. Not too bad.
It should also be noted that if we look at the ratio of the mean estimates of the two thresholds this is also quite close to the generating ratio of four:
> mean(res\$par[,"red"])/mean(res\$par[,"green"])
[1] 4.3913

That's it.