Sunday, June 9, 2024

A phytools function to fit the discrete-state-dependent multi-rate continuous character evolution model using the discrete approximation

Just the day before yesterday, I posted a script showing how to fit a Brownian multi-rate discrete-state-dependent continuous trait evolution model using the discrete approximation of Boucher & Démery (2016) from the most important phylogenetic comparative methods paper you’ve probably never read.

This is an analysis of the style of O’Meara et al. (2006), but in which we’re going to jointly model the evolution of our discrete character (via a standard Mk stochastic process) and our continuous trait (by a multi-rate Brownian process, in with the rate, \(\sigma^2\), depends on the state of our discrete trait).

Today, I have preliminary R code for a new phytools function to implement this model.

This code will probably be available on GitHub by the time this post goes live, but for those people interested in such things, I’ve also reproduced the full function code below.

x_by_y<-function(x,y){
  nn<-vector()
  for(i in 1:length(x)) for(j in 1:length(y))
    nn<-c(nn,paste(x[i],y[j],sep=","))
  nn
}

expand.range<-phytools:::expand.range
to_binned<-phytools:::to_binned

fitmultiBM<-function(tree,x,y=NULL,model="ER",ncat=1,...){
  levs<-if(hasArg(levs)) list(...)$levs else 100
  parallel<-if(hasArg(parallel)) list(...)$parallel else 
    FALSE
  ncores<-if(hasArg(ncores)) list(...)$ncores else 
    detectCores()-1
  ## continuous character
  x<-x[tree$tip.label]
  lims<-expand.range(x)
  dd<-diff(lims)
  tol<-1e-8*dd/levs
  bins<-cbind(seq(from=lims[1]-tol,by=(dd+2*tol)/levs,
    length.out=levs),seq(to=lims[2]+tol,by=(dd+2*tol)/levs,
      length.out=levs))
  X<-to_binned(x,bins)
  ## discrete character
  if(!is.null(y)){
    if(is.matrix(y)){
      y<-y[tree$tip.label,]
      m<-ncol(y)
      states<-colnames(y)
    } else {
      y<-to.matrix(y,sort(unique(y)))
      y<-y[tree$tip.label,]
      m<-ncol(y)
      states<-colnames(y)
    }
  } else {
    y<-matrix(1,nrow(X),1,dimnames=list(rownames(X),"0"))
  }
  if(ncat>1){
    y_tmp<-y
    for(i in 2:ncat){
      colnames(y_tmp)<-paste(colnames(y_tmp),"*",sep="")
      y<-cbind(y,y_tmp)
    }
   y<-y[,order(colnames(y))]
  }
  ## combine
  nn<-x_by_y(colnames(y),colnames(X))
  XX<-matrix(0,nrow=nrow(X),ncol=ncol(y)*ncol(X),
    dimnames=list(rownames(X),nn))
  for(i in 1:ncol(y)){
    for(j in 1:nrow(X)){
      XX[j,1:levs+(i-1)*levs]<-X[j,]*y[j,i]
    }
  }
  ## build continuous model
  cmodel<-matrix(0,nrow=ncol(XX),ncol=ncol(XX),
    dimnames=list(nn,nn))
  for(i in 1:(levs-1)){
    for(j in 0:(ncol(y)-1)){
      cmodel[i+j*levs,i+1+j*levs]<-
        cmodel[i+1+j*levs,i+j*levs]<-(j+1)
    }
  }
  ## build discrete model
  dmodel<-matrix(0,nrow=ncol(XX),ncol=ncol(XX),
    dimnames=list(nn,nn))
  qmodel<-matrix(0,nrow=ncol(y),ncol=ncol(y),
    dimnames=list(colnames(y),colnames(y)))
  dn<-strsplit(colnames(qmodel),"")
  foo<-function(x){
    a<-paste(x[x!="*"],collapse="")
    b<-paste(x[x=="*"],collapse="")
    return(c(a,b))
  }
  dn<-sapply(dn,foo)
  q1<-matrix(0,length(unique(dn[1,])),
    length(unique(dn[1,])),
    dimnames=list(sort(unique(dn[1,])),
      sort(unique(dn[1,]))))
  if(model=="ER"){ 
    q1[]<-1
    diag(q1)<-0
  } else if(model=="SYM"){
    k<-1
    for(i in 1:(nrow(q1)-1)){
      for(j in (i+1):ncol(q1)){
        q1[i,j]<-q1[j,i]<-k
        k<-k+1
      }
    }
  } else if(model=="ARD"){
    k<-1
    for(i in 1:nrow(q1)){
      for(j in 1:ncol(q1)){
        if(i!=j){
          q1[i,j]<-k
          k<-k+1
        }
      }
    }
  }
  q2<-matrix(0,length(unique(dn[2,])),
    length(unique(dn[2,])),
    dimnames=list(sort(unique(dn[2,])),
      sort(unique(dn[2,]))))
  if(model=="ER"){ 
    q2[]<-max(q1)+1
    diag(q2)<-0
  } else if(model=="SYM"){
    k<-max(q1)+1
    for(i in 1:(nrow(q2)-1)){
      for(j in (i+1):ncol(q2)){
        q2[i,j]<-q2[j,i]<-k
        k<-k+1
      }
    }
  } else if(model=="ARD"){
    k<-max(q1)+1
    for(i in 1:nrow(q2)){
      for(j in 1:ncol(q2)){
        if(i!=j){
          q2[i,j]<-k
          k<-k+1
        }
      }
    }
  }
  for(i in 1:nrow(qmodel)){
    for(j in 1:ncol(qmodel)){
      if(dn[1,i]!=dn[1,j]&&dn[2,i]==dn[2,j]) 
        qmodel[i,j]<-q1[dn[1,i],dn[1,j]]
      else if(dn[1,i]==dn[1,j]&&dn[2,i]!=dn[2,j])
        qmodel[i,j]<-q2[which(rownames(q2)==dn[2,i]),
          which(colnames(q2)==dn[2,j])]
    }
  }
  dn<-strsplit(nn,",")
  for(i in 1:ncol(dmodel)){
    for(j in 1:ncol(dmodel)){
      if(dn[[i]][2]==dn[[j]][2]){
        dmodel[i,j]<-qmodel[dn[[i]][1],dn[[j]][1]]
      }
    }
  }
  dmodel[dmodel>0]<-dmodel[dmodel>0]+max(cmodel)
  model<-cmodel+dmodel
  plot_model<-if(hasArg(plot_model)) 
    list(...)$plot_model else FALSE
  if(plot_model){
    fg_col<-par()$fg
    par(fg="transparent")
    plot(model,las=2,cex.axis=0.4,
      col=c("white",hcl.colors(max(model))),
      breaks=seq(0,max(model)+1,by=1),
      main="structure of model")
    par(fg=fg_col)
  }
  q.init<-runif(n=ncol(y)+max(qmodel),0,2)*c(
    rep((1/2)*mean(pic(x,multi2di(tree))^2)*(levs/dd)^2,
      ncol(y)),
    rep(fitMk(tree,y,model="ER")$rates,max(qmodel)))
  fit<-fitMk(tree,XX,model=model,
    lik.func=if(parallel) "parallel" else "pruning",
    expm.method=if(isSymmetric(model)) "R_Eigen" else 
      "Higham08.b",
    pi="mle",logscale=TRUE,q.init=q.init)
  sig2<-2*fit$rates[1:max(cmodel)]*(dd/levs)^2
  q<-fit$rates[1:max(qmodel)+max(cmodel)]
  index.matrix<-qmodel
  lnL<-logLik(fit)-Ntip(tree)*log(dd/levs)
  attr(lnL,"df")<-max(model)+1

  lfunc<-function(sig2,q,x0="mle",...){
    if(hasArg(lik.func)) lik.func<-list(...)$lik.func
    else lik.func<-"pruning"
    if(hasArg(parallel)) parallel<-list(...)$parallel
    else parallel<-FALSE
    q<-c((sig2/2)*(levs/dd)^2,q)
    if(x0=="mle") pi<-"mle"
    else if(x0=="nuisance") pi<-"fitzjohn"
    else if(is.numeric(x0)) pi<-to_binned(x0,bins)[1,]
    if(lik.func=="pruning"){
      lnL<-pruning(q,tree,XX,model,pi=pi,
        expm.method=if(isSymmetric(model)) "R_Eigen" else 
          "Higham08.b")-
        Ntip(tree)*log(dd/levs)
    } else if(lik.func=="parallel"){
      if(!exists("ncores")) ncores<-min(nrow(tree$edge),
        detectCores()-1)
      mc<-makeCluster(ncores,type="PSOCK")
      registerDoParallel(cl=mc)
      lnL<-parallel_pruning(q,tree,XX,model,pi=pi,
        expm.method=if(isSymmetric(model)) "R_Eigen" else 
          "Higham08.b")-Ntip(tree)*log(dd/levs)
      stopCluster(cl=mc)
    }
    lnL
  }
  object<-list(
    sigsq=sig2,
    x0=sum(fit$pi*rowMeans(bins)),
    rates=q,
    index.matrix=qmodel,
    states=colnames(y),
    ncat=levs,
    logLik=lnL,
    opt_results=fit$opt_results,
    mk_fit=fit,
    lik=lfunc)
  class(object)<-c("fitmultiBM","fitMk")
  object
}

print.fitmultiBM<-function(x,digits=4,...){
  cat(paste("Object of class \"fitmultiBM\" based on\n",
    "	a discretization with k =",
    x$ncat,"levels.\n\n"))
  cat("Fitted multi-rate BM model parameters:\n")
  cat(paste(" levels: [",
    paste(colnames(x$index.matrix),collapse=", "),
    "]\n"))
  cat(paste("  sigsq: [",
    paste(round(x$sigsq,digits),collapse=", "),
    "]\n"))
  cat(paste("     x0:",round(x$x0,digits),"\n\n"))
  print(as.Qmatrix(x))
  cat(paste("\nLog-likelihood:",round(x$logLik,digits),
    "\n\n"))
  if(x$opt_results$convergence == 0) 
    cat("R thinks it has found the ML solution.\n\n")
  else cat(
    "R thinks optimization may not have converged.\n\n")
}

logLik.fitmultiBM<-function(object) object$logLik

The hardest part (by far) of the code above was figuring out how to create the design matrices combining our continuous and discrete character evolution models. I’m still not 100% I’ve got everything right for all models, but I’ll continue to work out any kinks.

Of course, we’ve got to test it out! First, let’s load phytools.

library(phytools)

Here, I’m going to keep thing simple by simulating an equal-rates 3-state discrete character history on the tree with levels "a", "b", and "c" (and transition rate between states, \(q = 0.05\)); and then a continuous trait evolving with \(\sigma_{a}^2 = 0.5\), \(\sigma_{b}^2 = 5.0\), and \(\sigma_{c}^2 = 20\).

## simulate a tree
tree<-pbtree(n=300,scale=10)
## discrete character matrix
Q<-0.05*matrix(c(
  -2,1,1,
  1,-2,1,
  1,1,-2),3,3,
  dimnames=list(c("a","b","c"),
    c("a","b","c")))
Q
##       a     b     c
## a -0.10  0.05  0.05
## b  0.05 -0.10  0.05
## c  0.05  0.05 -0.10
tt<-sim.history(tree,Q,anc="b")
## Done simulation(s).
y<-as.factor(getStates(tt,"tips"))
sig2<-setNames(c(0.5,5,20),c("a","b","c"))
sig2
##    a    b    c 
##  0.5  5.0 20.0
x<-sim.rates(tt,sig2)

We can plot it much like we did last time to see what this evolution looks like.

layout(matrix(c(1,2),1,2),widths=c(0.45,0.55))
cols<-setNames(hcl.colors(n=3),c("a","b","c"))
plot(tt,cols,ftype="off",mar=c(4.1,1.1,1.1,1.1))
par(mar=c(4.1,5.1,1.1,1.1))
phenogram(tt,x,colors=cols,ftype="off",las=1,
  cex.axis=0.8)
legend("bottomleft",c(
  expression(paste(sigma^2,"[a] = 0.5")),
  expression(paste(sigma^2,"[b] = 5.0")),
  expression(paste(sigma^2,"[c] = 20"))),
  col=cols,lwd=3,bty="n",cex=0.8)

plot of chunk unnamed-chunk-19

(Remember – even though I’ve graphed the generating regimes here, our method will only take the observed data for our continuous and discrete traits at the tips of the tree!)

Now let’s fit our state-dependent model with our new phytools function from above: fitmultiBM. Before we do, we need to load the parallel package to support parallelization. Don’t worry – you already have this installed.

library(parallel)

In our function call, below, model="ER" refers to the model for the evolution of our discrete trait – i.e., that the transitions rates between all states are equal to one another.

parallel=TRUE tells the function to parallelize matrix exponentiation across cores during pruning, which can be very handy.

Finally, levs refers to the number of “bins” used for our discrete approximation of the likelihood following Boucher & Démery (2016). In general, the more than better – but be forewarned that computation gets a lot slower the higher you set levs! (I initially set levs = 200 but then gave up because this post was taking too long to build.)

sd_fit<-fitmultiBM(tree,x,y,model="ER",
  parallel=TRUE,levs=100)
sd_fit
## Object of class "fitmultiBM" based on
##  	a discretization with k = 100 levels.
## 
## Fitted multi-rate BM model parameters:
##  levels: [ a, b, c ]
##   sigsq: [ 0.5106, 3.8966, 19.5789 ]
##      x0: 0.4704 
## 
## Estimated Q matrix:
##            a          b          c
## a -0.1091734  0.0545867  0.0545867
## b  0.0545867 -0.1091734  0.0545867
## c  0.0545867  0.0545867 -0.1091734
## 
## Log-likelihood: -712.6662 
## 
## R thinks it has found the ML solution.

As I showed yesterday, a cool feature of this model is that we can even fit it when the discrete character hasn’t been observed. Then our Mk model simply takes the form of a hidden Markov model (i.e., hidden-states model á la Beaulieu et al. 2013).

Let’s try it. (We should not forget that the likelihood of this model cannot be compared to our previous model as we are not accounting for the probability of observing the discrete character data!)

Here, by just leaving y out of the model he function automatically creates a hidden trait with ncat levels. I’ll set ncat=3 to match our previous model and the generating conditions.

hrm_fit<-fitmultiBM(tree,x,ncat=3,parallel=TRUE,
  levs=100)
hrm_fit
## Object of class "fitmultiBM" based on
##  	a discretization with k = 100 levels.
## 
## Fitted multi-rate BM model parameters:
##  levels: [ 0, 0*, 0** ]
##   sigsq: [ 17.3349, 2.7489, 0.5877 ]
##      x0: 0.4704 
## 
## Estimated Q matrix:
##               0          0*         0**
## 0   -0.10935991  0.05467995  0.05467995
## 0*   0.05467995 -0.10935991  0.05467995
## 0**  0.05467995  0.05467995 -0.10935991
## 
## Log-likelihood: -593.0114 
## 
## R thinks it has found the ML solution.

Of course, we should not expect the rates to be in the same order as our generating rates; however, it’s still pretty amazing how close we probably got to the three underlying rates of \(\sigma^2 = [ 0.5, 5, 20 ]\).

Likewise, note the resemblance between our estimated transition rate in the unobserved hidden character and our true generating transition rate (which was \(q = 0.05\)) between continuous character rate regimes.

I think this is pretty neat!

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.