Close followers of this blog will already be familiar with my recent obsession with Boucher & Démery (2016), aka. the most important phylogenetic comparative methods paper you’ve probably never read.
If you haven’t, I’d recommend my earlier posts on this topic here and here.
I’ve now implemented the bounded Brownian motion (BBM) model of Boucher & Démery (2016) in phytools via the function bounded_bm
, but what I hadn’t mentioned previously as that I’ve also added an ancr
ancestral state estimation routine for the model object class.
Code is here. Let’s play around with it a bit.
library(phytools)
packageVersion("phytools")
## [1] '2.1.13'
First, I’m going to simulate a tree and then some data under BBM using phytools::fastBM
as follows.
tree<-pbtree(n=200,scale=1)
x<-fastBM(tree,sig2=0.4,a=0.5,bounds=c(0,1),internal=TRUE)
phenogram(tree,x,ftype="off",col=make.transparent("blue",0.5),
las=1,cex.axis=0.8)
clip(0,1,-1,2)
abline(h=c(0,1),lty="dotted")
text(0,0.97,"upper bound",pos=4,cex=0.7)
text(0,0.03,"lower bound",pos=4,cex=0.7)
This isn’t as pretty as my illustrative bounded Brownian motion simulations of earlier posts; however, we can see that evolution is both bounded and that the bounds are being reached multiple times.
Before we continue, I’m going to separate out the tip values (which is all we’d know in an empirical study, of course) from the simulated internal states.
a<-x[1:tree$Nnode+Ntip(tree)]
x<-x[tree$tip.label]
All right. Let’s go ahead and first fit an unbounded (i.e., standard Brownian) model to these data.
fit_unbounded<-bounded_bm(tree,x,lik.func="eigen",parallel=TRUE)
fit_unbounded
## Object of class "bounded_bm" based on
## a discretization with k = 200 levels.
##
## Unwrapped (i.e., bounded) model
##
## Set or estimated bounds: [ -0.974998 , 1.983823 ]
##
## Fitted model parameters:
## sigsq: 0.250278
## x0: 0.496016
##
## Log-likelihood: 16.589545
##
## R thinks it has found the ML solution.
To proceed and estimate ancestral states I’ll use the ancr
method as follows.
anc_unbounded<-ancr(fit_unbounded)
anc_unbounded
## Ancestral character estimates from "bounded_bm" object:
## 201 202 203 204 205 206
## 0.496016 0.505804 0.501098 0.529028 0.547592 0.599433 ....
##
## Lower & upper 95% CIs:
## lower upper
## 201 0.245516 0.748515
## 202 0.26031 0.763309
## 203 0.26031 0.748515
## 204 0.289898 0.778104
## 205 0.275104 0.822486
## 206 0.289898 0.91125
## .... ....
Now the actual procedure that I’m employing inside the ancr
method uses the discretized bin midpoints and marginal scaled likelihoods to compute a weighted mean estimate of the trait at each node – but this strategy produces virtually identical estimates to those we would’ve obtained using ML on the original scale of our data, which we can see by comparing these estimates to those from phytools::fastAnc
.
par(mar=c(5.1,4.1,2.1,2.1))
plot(fastAnc(tree,x),anc_unbounded$ace,bty="n",xlim=c(0,1),ylim=c(0,1),
pch=21,bg=make.transparent("grey",0.5),cex=1.2,xlab="MLEs from fastAnc",
ylab="estimates from discretized space using ancr")
grid()
I guess the only point here is that we’re getting the same values! So far so good.
The more important question, though, is how close our estimates are to the true generating states.
There, we’re not doing as great. To see that, I’m going to graph not only the estimates, but their 95% confidence intervals. I’ll then use grey & red dots, respectively, to highlight true values falling within (grey) our outside (red) our 95% CI around each estimate. Note that on average we would expect our CI to include the true value 19 times out of 20.
par(mar=c(5.1,4.1,2.1,2.1))
plot(NA,bty="n",xlim=c(0,1),
ylim=range(anc_unbounded$CI95),xlab="MLE estimates & CIs",
ylab="true values")
for(i in 1:nrow(anc_unbounded$CI95)) lines(rep(anc_unbounded$ace[i],2),
anc_unbounded$CI95[i,],lwd=2,col=make.transparent("blue",0.25))
outCI<-union(which(a<anc_unbounded$CI95[,1]),which(a>anc_unbounded$CI95[,2]))
cols<-rep("grey",length(a))
cols[outCI]<-"red"
points(anc_unbounded$ace,a,pch=20,col=cols)
grid()
legend("topleft",c("95% CI","true value on CI","true value outside CI"),
lty=c("solid",NA,NA),pch=c(NA,20,20),
col=c(make.transparent("blue",0.25),"grey","red"),
cex=0.6,bty="n",pt.cex=1,lwd=2)
Typically, here we’ll see a substantively greater fraction than 5% of estimates falling outside the CI.
length(outCI)/tree$Nnode
## [1] 0.1256281
OK, now let’s do the same using bounded Brownian motion. I’ll do that using the same function as before but in which I update lims = c(0,1)
, the true values of the limits of our trait.
fit_bounded<-bounded_bm(tree,x,lims=c(0,1),lik.func="eigen",parallel=TRUE)
fit_bounded
## Object of class "bounded_bm" based on
## a discretization with k = 200 levels.
##
## Unwrapped (i.e., bounded) model
##
## Set or estimated bounds: [ 0 , 1 ]
##
## Fitted model parameters:
## sigsq: 0.442651
## x0: 0.545151
##
## Log-likelihood: 55.680628
##
## R thinks it has found the ML solution.
anc_bounded<-ancr(fit_bounded)
anc_bounded
## Ancestral character estimates from "bounded_bm" object:
## 201 202 203 204 205 206
## 0.545151 0.567683 0.57462 0.638257 0.664605 0.714024 ....
##
## Lower & upper 95% CIs:
## lower upper
## 201 0.0575 0.9675
## 202 0.0675 0.9725
## 203 0.0775 0.9725
## 204 0.1225 0.9825
## 205 0.1275 0.9875
## 206 0.1925 0.9875
## .... ....
We’re now ready to make the same comparison as before.
par(mar=c(5.1,4.1,2.1,2.1))
plot(NA,bty="n",xlim=c(0,1),
ylim=range(anc_bounded$CI95),xlab="MLE estimates & CIs",
ylab="true values")
for(i in 1:nrow(anc_bounded$CI95)) lines(rep(anc_bounded$ace[i],2),
anc_bounded$CI95[i,],lwd=2,col=make.transparent("blue",0.25))
outCI<-union(which(a<anc_bounded$CI95[,1]),which(a>anc_bounded$CI95[,2]))
cols<-rep("grey",length(a))
cols[outCI]<-"red"
points(anc_bounded$ace,a,pch=20,col=cols)
grid()
legend("topleft",c("95% CI","true value on CI","true value outside CI"),
lty=c("solid",NA,NA),pch=c(NA,20,20),
col=c(make.transparent("blue",0.25),"grey","red"),
cex=0.6,bty="n",pt.cex=1,lwd=2)
We should see that our CIs are broader, but they also capture much closer to 95% of the true, generating values of the trait at nodes. If we repeat this simulation many times, we’ll find that it’s true in general.
length(outCI)/tree$Nnode
## [1] 0.04522613
That’s all I’ve got, folks.
No comments:
Post a Comment
Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.