The central idea is that this model, in which the value for a discrete character is determined by the evolution of an underlying continuous trait called 'liability,' might be a more appropriate model for many discretely manifested, by complex or polygenic, attributes of organismal phenotype.
I'm estimating under this model by using Bayesian MCMC to sample ancestral and tip liabilities, and the thresholds that result in a change in the discrete character, from their joint posterior probability distribution.
I'm still working on this, but here is an example simulation, analysis, and then some assessments of performance:
> source("ancThresh.R")
> # simulate liabilities
> l<-fastBM(tree,internal=TRUE,sig2=2,a=-1)
> # translate liabilities to threshold character
> x<-rep("blue",length(l))
> x[l>0]<-"green"
> x[l>1]<-"red"
> x[l>6]<-"yellow" > names(x)<-names(l)
> x
t54 t55 t51 ...
"blue" "blue" "blue" ...
> x11(); res<-ancThresh(tree,x[1:length(tree$tip)],ngen=100000)
**** NOTE: no sequence provided, using alphabetical or numerical order
MCMC starting....
gen 1000
gen 2000
...
> # simulate liabilities
> l<-fastBM(tree,internal=TRUE,sig2=2,a=-1)
> # translate liabilities to threshold character
> x<-rep("blue",length(l))
> x[l>0]<-"green"
> x[l>1]<-"red"
> x[l>6]<-"yellow" > names(x)<-names(l)
> x
t54 t55 t51 ...
"blue" "blue" "blue" ...
> x11(); res<-ancThresh(tree,x[1:length(tree$tip)],ngen=100000)
**** NOTE: no sequence provided, using alphabetical or numerical order
MCMC starting....
gen 1000
gen 2000
...
> # plot the likelihood profile
> plot(res$par[,"gen"],res$par[,"logLik"],xlab="gen", ylab="logL")
> lines(res$par[,"gen"],res$par[,"logLik"])
> plot(res$par[,"gen"],res$par[,"logLik"],xlab="gen", ylab="logL")
> lines(res$par[,"gen"],res$par[,"logLik"])
> # pull estimates as max PP for each node
> est<-apply(res$ace,1,function(x) names(x)[which(x==max(x))])
> est
101 102 103 ...
"red" "blue" "blue" ...
> # do they match?
> a<-x[101:199] # get simulated ancestors
> sum(a==est)/length(a)
[1] 0.7878788 # about 79% match
> # what if we restrict to >0.5, 0.7, or 0.9 PP?
> any50<-apply(res$ace,1,function(x) any(x>0.5))
> sum(any50)/tree$Nnode # gives us the % nodes with PP>0.5
[1] 0.969697
> # now the fraction that match, given PP>0.5
> sum(a[any50]==est[any50])/sum(any50)
[1] 0.7916667
> # PP > 0.7
> any70<-apply(res$ace,1,function(x) any(x>0.7))
> sum(any70)/tree$Nnode
[1] 0.5050505
> sum(a[any70]==est[any70])/sum(any70)
[1] 0.96
> # 96% of nodes with PP>0.7 are correct
> # PP > 0.9
> any90<-apply(res$ace,1,function(x) any(x>0.9))
> sum(any90)/tree$Nnode
[1] 0.2929293
> sum(a[any90]==est[any90])/sum(any90)
[1] 1
> est<-apply(res$ace,1,function(x) names(x)[which(x==max(x))])
> est
101 102 103 ...
"red" "blue" "blue" ...
> # do they match?
> a<-x[101:199] # get simulated ancestors
> sum(a==est)/length(a)
[1] 0.7878788 # about 79% match
> # what if we restrict to >0.5, 0.7, or 0.9 PP?
> any50<-apply(res$ace,1,function(x) any(x>0.5))
> sum(any50)/tree$Nnode # gives us the % nodes with PP>0.5
[1] 0.969697
> # now the fraction that match, given PP>0.5
> sum(a[any50]==est[any50])/sum(any50)
[1] 0.7916667
> # PP > 0.7
> any70<-apply(res$ace,1,function(x) any(x>0.7))
> sum(any70)/tree$Nnode
[1] 0.5050505
> sum(a[any70]==est[any70])/sum(any70)
[1] 0.96
> # 96% of nodes with PP>0.7 are correct
> # PP > 0.9
> any90<-apply(res$ace,1,function(x) any(x>0.9))
> sum(any90)/tree$Nnode
[1] 0.2929293
> sum(a[any90]==est[any90])/sum(any90)
[1] 1
Cool. I will post the code & more on this very soon.
Dear Liam,
ReplyDeleteIs there a way to run threshBayes with a single discrete character?
Also, I am relatively new to R so when I run threshBayes with two characters I do not know how to plot the results onto the phylogeny as you did above.
I would be very greatful for any tips!
Ricardo
Hi Ricardo.
DeleteThis cannot be done with threshBayes, which is specifically for a two character, correlated evolution model. The method described in this post is actually implemented in a different function, ancThresh, which I have just posted; but this code is "bleeding edge" and should be used only with caution.
- Liam