## Monday, May 10, 2021

### Fitting a single Mk discrete character evolution model to a set of different discrete traits using likelihood

Recently on the Twitterverse I came across a thread in which the poster described using revBayes to fit an Mk model to multiple discrete traits at the same time, assuming the same value of the transition matrix (Q) for all traits.

This is logical for some traits because fitting the Mk model to a single discrete trait can often be data-limited, particularly if the trait changes infrequently (which will thus leave very little information about the transition process for any particular trait).

This would not make a lot of sense for situations in which, for example, we expected the rate of change to be widely different from trait to trait. On the other hand, it make very good sense if changes are rare, and/or if there are biological reasons to presume that the rate of change in the trait is similar for different traits, and particularly if we intend to proceed to use our estimated value for Q in other analyses, such as in ancestral state reconstruction or stochastic mappping. (More on this in a separate episode!)

Of course, this can also be done in R!

Firstly, it can be done using my phytools function `ratebytree` because this function (by default) fits two models – one in which the transition rates are different across all the trees and a second in which they are all the same.

However, it can also be fit just by making your own custom likelihood function and then optimizing it. I'm going to demonstrate that option here since it will run faster than using `ratebytree` – simply because we're not fitting the multi-rates model too.

First, let me load phytools and then simulate a tree & 20 binary characters on the phylogeny with a constant value of the transition matrix Q as follows.

``````library(phytools)
Q<-matrix(c(-2,1,2,-1),2,2,dimnames=list(0:1,0:1))
Q
``````
``````##    0  1
## 0 -2  2
## 1  1 -1
``````
``````X<-sim.Mk(tree<-pbtree(n=100),Q,nsim=20)
``````
``````##     V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
## t2   1  1  0  1  1  1  1  0  1   0   0   0   0   1   1   0   0   1   1   1
## t64  0  0  0  1  1  1  0  0  1   1   1   1   1   1   1   0   1   0   1   0
## t65  0  1  1  0  0  1  1  1  1   1   1   0   0   0   1   1   1   0   1   1
## t16  0  0  0  0  0  0  0  0  1   1   1   0   0   1   0   1   0   1   0   1
## t17  1  0  1  0  0  1  1  1  1   1   1   1   1   0   1   1   1   0   0   1
## t18  0  1  1  1  1  0  0  1  1   1   1   1   0   1   1   1   0   1   0   1
``````
``````cols<-replicate(20,setNames(c("white","grey"),0:1),
simplify=FALSE)
plotTree.datamatrix(tree,X,sep=0,colors=cols)
``````

Readers can see that I simulated data for a binary trait with two levels, 0 and 1, in which the transition rate from 0 → 1 is 2.0; while the transition rate from 1 → 0 is 1.0.

Great, now let's make our likelihood function.

The way I'm going to do this is by using the phytools function `fitMk` but in which I fix the value of Q to a value that I will pre-specify within our likelihood function.

There are other ways we could have done this. For instance, both `fitMk` and `fitDiscrete` in the geiger package also export the likelihood function as part of the model object.

Here's what our likelihood function is going to look like. Note that to use `mapply` to add the log-likelihoods across all the traits, I first recompose my trait data array (`X`) and my Q matrix (`Q`) into lists. In theory this code does not require that I have the same character coding for each trait – merely that I'm willing to assume the same value of Q, given the row/colum order matches the order of the levels of each trait. (It also assumes that each trait has more than one state. It would be a simple fix to relax this assumption.)

``````lik<-function(q,X,tree,trace=0){
X<-lapply(as.list(X),setNames,rownames(X))
Q<-lapply(X,function(x,q)
matrix(c(-q[1],q[2],q[1],-q[2]),2,2,
dimnames=list(levels(x),levels(x))),
q=q)
LIK<-function(x,Q,tree)
-logLik(fitMk(tree,x,fixedQ=Q))
logL<-sum(mapply(LIK,x=X,Q=Q,MoreArgs=list(tree=tree)))
if(trace>0){
cat(paste("q[1] = ",signif(q[1],5),
"; q[2] = ",signif(q[2],5),"; logL = ",
signif(-logL,8),"\n",sep=""))
flush.console()
}
logL
}
``````

Now that I have my likelihood function I can go ahead and optimize it using one of R's built-in numerical optimization routines. I will use `optim` – but there are others I could have tried (e.g., `nlminb`).

In `optim` I'll use the method `"L-BFGS-B"` which is a 'box-constrained' routine. Values for Qi,j (for ij) are logically constrained to be ≥ 0, but to avoid problems, I'll set the lower bounds to 10-8. I'll set my upper bound to 100, but this could be adjusted based on the total depth of the tree.

``````fit<-optim(c(1,1),lik,tree=tree,X=X,trace=1,method="L-BFGS-B",
lower=1e-8,upper=100)
``````
``````## q[1] = 1; q[2] = 1; logL = -1278.2533
## q[1] = 1.001; q[2] = 1; logL = -1278.0522
## q[1] = 0.999; q[2] = 1; logL = -1278.4548
## q[1] = 1; q[2] = 1.001; logL = -1278.4422
## q[1] = 1; q[2] = 0.999; logL = -1278.0647
## q[1] = 100; q[2] = 1e-08; logL = -15397.215
## q[1] = 100; q[2] = 1e-08; logL = -15397.215
## q[1] = 99.999; q[2] = 1e-08; logL = -15397.203
## q[1] = 100; q[2] = 0.001; logL = -7820.664
## q[1] = 100; q[2] = 1e-08; logL = -15397.215
## q[1] = 48.459; q[2] = 0.52061; logL = -3109.0536
## q[1] = 48.46; q[2] = 0.52061; logL = -3109.0682
## q[1] = 48.458; q[2] = 0.52061; logL = -3109.039
## q[1] = 48.459; q[2] = 0.52161; logL = -3107.8083
## q[1] = 48.459; q[2] = 0.51961; logL = -3110.3014
## q[1] = 13.116; q[2] = 0.87762; logL = -1927.0841
## q[1] = 13.117; q[2] = 0.87762; logL = -1927.1295
## q[1] = 13.115; q[2] = 0.87762; logL = -1927.0387
## q[1] = 13.116; q[2] = 0.87862; logL = -1926.4874
## q[1] = 13.116; q[2] = 0.87662; logL = -1927.6816
## q[1] = 3.9157; q[2] = 0.97055; logL = -1323.1261
## q[1] = 3.9167; q[2] = 0.97055; logL = -1323.1988
## q[1] = 3.9147; q[2] = 0.97055; logL = -1323.0533
## q[1] = 3.9157; q[2] = 0.97155; logL = -1322.8752
## q[1] = 3.9157; q[2] = 0.96955; logL = -1323.3775
## q[1] = 2.0027; q[2] = 0.98987; logL = -1213.548
## q[1] = 2.0037; q[2] = 0.98987; logL = -1213.5633
## q[1] = 2.0017; q[2] = 0.98987; logL = -1213.5328
## q[1] = 2.0027; q[2] = 0.99087; logL = -1213.5138
## q[1] = 2.0027; q[2] = 0.98887; logL = -1213.5826
## q[1] = 1.9741; q[2] = 1.0321; logL = -1212.2328
## q[1] = 1.9751; q[2] = 1.0321; logL = -1212.2387
## q[1] = 1.9731; q[2] = 1.0321; logL = -1212.2269
## q[1] = 1.9741; q[2] = 1.0331; logL = -1212.2191
## q[1] = 1.9741; q[2] = 1.0311; logL = -1212.2467
## q[1] = 1.9571; q[2] = 1.0611; logL = -1211.9706
## q[1] = 1.9581; q[2] = 1.0611; logL = -1211.9704
## q[1] = 1.9561; q[2] = 1.0611; logL = -1211.9709
## q[1] = 1.9571; q[2] = 1.0621; logL = -1211.9697
## q[1] = 1.9571; q[2] = 1.0601; logL = -1211.9719
## q[1] = 1.9613; q[2] = 1.0652; logL = -1211.9659
## q[1] = 1.9623; q[2] = 1.0652; logL = -1211.9654
## q[1] = 1.9603; q[2] = 1.0652; logL = -1211.9665
## q[1] = 1.9613; q[2] = 1.0662; logL = -1211.9657
## q[1] = 1.9613; q[2] = 1.0642; logL = -1211.9664
## q[1] = 1.9791; q[2] = 1.0764; logL = -1211.9573
## q[1] = 1.9801; q[2] = 1.0764; logL = -1211.9568
## q[1] = 1.9781; q[2] = 1.0764; logL = -1211.9579
## q[1] = 1.9791; q[2] = 1.0774; logL = -1211.958
## q[1] = 1.9791; q[2] = 1.0754; logL = -1211.9569
## q[1] = 1.991; q[2] = 1.0812; logL = -1211.9549
## q[1] = 1.992; q[2] = 1.0812; logL = -1211.9548
## q[1] = 1.99; q[2] = 1.0812; logL = -1211.9551
## q[1] = 1.991; q[2] = 1.0822; logL = -1211.9553
## q[1] = 1.991; q[2] = 1.0802; logL = -1211.9548
## q[1] = 1.993; q[2] = 1.0815; logL = -1211.9548
## q[1] = 1.994; q[2] = 1.0815; logL = -1211.9548
## q[1] = 1.992; q[2] = 1.0815; logL = -1211.9548
## q[1] = 1.993; q[2] = 1.0825; logL = -1211.955
## q[1] = 1.993; q[2] = 1.0805; logL = -1211.9549
## q[1] = 1.993; q[2] = 1.0814; logL = -1211.9548
## q[1] = 1.994; q[2] = 1.0814; logL = -1211.9548
## q[1] = 1.992; q[2] = 1.0814; logL = -1211.9548
## q[1] = 1.993; q[2] = 1.0824; logL = -1211.9549
## q[1] = 1.993; q[2] = 1.0804; logL = -1211.9549
``````
``````fit
``````
``````## \$par
## [1] 1.992985 1.081415
##
## \$value
## [1] 1211.955
##
## \$counts
##       13       13
##
## \$convergence
## [1] 0
##
## \$message
## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
``````

Remembering that to simulate our data, we set Q1,2 to be 2.0 and Q2,1 to be 1.0. Our estimates turn out to be really very close.

Now let's compare to `ratebytree` as follows:

``````ratebytree(rep(tree,20),lapply(as.list(X),setNames,rownames(X)),
model="ARD")
``````
``````## ML common-rate model:
##  1->0    0->1    k   logL
## value    1.0812  1.9929  2   -1211.9548
##
## Model fitting method was "optim".
##
## ML multi-rate model:
##  1->0    0->1    k   logL
## tree1    0.6997  1.4017  40  -1200.0165
## tree2    1.007   1.431
## tree3    0.9505  1.6508
## tree4    1.0151  2.0233
## tree5    1.0356  2.6362
## tree6    1.1637  1.7085
## tree7    1.2568  3.7405
## tree8    1.536   2.5116
## tree9    1.3073  2.483
## tree10   0.5318  1.0522
## tree11   1.0557  2.4955
## tree12   0.9176  1.4706
## tree13   1.777   3.7395
## tree14   0.9998  1.6659
## tree15   1.2626  2.1373
## tree16   1.4788  1.9684
## tree17   0.588   1.1634
## tree18   2.4303  4.0818
## tree19   1.0434  2.1037
## tree20   2.4274  4.146
##
## Model fitting method was "nlminb".
##
## Likelihood ratio: 23.8765
## P-value (based on X^2): 0.9641
``````

Neat. This also tells us how varied our fitted model parameter estimated would have been if we fit our Mk model to any of these 20 trait vectors individually!