Recently on the Twitterverse I came across a thread in which the poster described using revBayes to fit an Mk model to multiple discrete traits at the same time, assuming the same value of the transition matrix (Q) for all traits.
This is logical for some traits because fitting the Mk model to a single discrete trait can often be data-limited, particularly if the trait changes infrequently (which will thus leave very little information about the transition process for any particular trait).
This would not make a lot of sense for situations in which, for example, we expected the rate of change to be widely different from trait to trait. On the other hand, it make very good sense if changes are rare, and/or if there are biological reasons to presume that the rate of change in the trait is similar for different traits, and particularly if we intend to proceed to use our estimated value for Q in other analyses, such as in ancestral state reconstruction or stochastic mappping. (More on this in a separate episode!)
Of course, this can also be done in R!
Firstly, it can be done using my phytools function ratebytree
because this function (by default) fits two
models – one in which the transition rates are different across all the trees and a second in which they are
all the same.
However, it can also be fit just by making your own custom likelihood function and then optimizing it. I'm going
to demonstrate that option here since it will run faster than using ratebytree
– simply because we're not
fitting the multi-rates model too.
First, let me load phytools and then simulate a tree & 20 binary characters on the phylogeny with a constant value of the transition matrix Q as follows.
library(phytools)
Q<-matrix(c(-2,1,2,-1),2,2,dimnames=list(0:1,0:1))
Q
## 0 1
## 0 -2 2
## 1 1 -1
X<-sim.Mk(tree<-pbtree(n=100),Q,nsim=20)
head(X)
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
## t2 1 1 0 1 1 1 1 0 1 0 0 0 0 1 1 0 0 1 1 1
## t64 0 0 0 1 1 1 0 0 1 1 1 1 1 1 1 0 1 0 1 0
## t65 0 1 1 0 0 1 1 1 1 1 1 0 0 0 1 1 1 0 1 1
## t16 0 0 0 0 0 0 0 0 1 1 1 0 0 1 0 1 0 1 0 1
## t17 1 0 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 0 0 1
## t18 0 1 1 1 1 0 0 1 1 1 1 1 0 1 1 1 0 1 0 1
cols<-replicate(20,setNames(c("white","grey"),0:1),
simplify=FALSE)
plotTree.datamatrix(tree,X,sep=0,colors=cols)
Readers can see that I simulated data for a binary trait with two levels, 0 and 1, in which the transition rate from 0 → 1 is 2.0; while the transition rate from 1 → 0 is 1.0.
Great, now let's make our likelihood function.
The way I'm going to do this is by using the phytools function fitMk
but in which I fix the value of Q to
a value that I will pre-specify within our likelihood function.
There are other ways we could have done this. For instance, both fitMk
and fitDiscrete
in the geiger package
also export the likelihood function as part of the model object.
Here's what our likelihood function is going to look like. Note that to use mapply
to add the log-likelihoods
across all the traits, I first recompose my trait data array (X
) and my Q matrix (Q
) into lists. In theory
this code does not require that I have the same character coding for each trait – merely that I'm willing to
assume the same value of Q, given the row/colum order matches the order of the levels of each trait. (It also
assumes that each trait has more than one state. It would be a simple fix to relax this assumption.)
lik<-function(q,X,tree,trace=0){
X<-lapply(as.list(X),setNames,rownames(X))
Q<-lapply(X,function(x,q)
matrix(c(-q[1],q[2],q[1],-q[2]),2,2,
dimnames=list(levels(x),levels(x))),
q=q)
LIK<-function(x,Q,tree)
-logLik(fitMk(tree,x,fixedQ=Q))
logL<-sum(mapply(LIK,x=X,Q=Q,MoreArgs=list(tree=tree)))
if(trace>0){
cat(paste("q[1] = ",signif(q[1],5),
"; q[2] = ",signif(q[2],5),"; logL = ",
signif(-logL,8),"\n",sep=""))
flush.console()
}
logL
}
Now that I have my likelihood function I can go ahead and optimize it using one of R's built-in numerical
optimization routines. I will use optim
– but there are others I could have tried (e.g., nlminb
).
In optim
I'll use the method "L-BFGS-B"
which is a 'box-constrained' routine. Values for Qi,j
(for i ≠ j) are logically constrained to be ≥ 0, but to avoid problems, I'll set the lower bounds to
10-8. I'll set my upper bound to 100, but this could be adjusted based on the total depth of the tree.
fit<-optim(c(1,1),lik,tree=tree,X=X,trace=1,method="L-BFGS-B",
lower=1e-8,upper=100)
## q[1] = 1; q[2] = 1; logL = -1278.2533
## q[1] = 1.001; q[2] = 1; logL = -1278.0522
## q[1] = 0.999; q[2] = 1; logL = -1278.4548
## q[1] = 1; q[2] = 1.001; logL = -1278.4422
## q[1] = 1; q[2] = 0.999; logL = -1278.0647
## q[1] = 100; q[2] = 1e-08; logL = -15397.215
## q[1] = 100; q[2] = 1e-08; logL = -15397.215
## q[1] = 99.999; q[2] = 1e-08; logL = -15397.203
## q[1] = 100; q[2] = 0.001; logL = -7820.664
## q[1] = 100; q[2] = 1e-08; logL = -15397.215
## q[1] = 48.459; q[2] = 0.52061; logL = -3109.0536
## q[1] = 48.46; q[2] = 0.52061; logL = -3109.0682
## q[1] = 48.458; q[2] = 0.52061; logL = -3109.039
## q[1] = 48.459; q[2] = 0.52161; logL = -3107.8083
## q[1] = 48.459; q[2] = 0.51961; logL = -3110.3014
## q[1] = 13.116; q[2] = 0.87762; logL = -1927.0841
## q[1] = 13.117; q[2] = 0.87762; logL = -1927.1295
## q[1] = 13.115; q[2] = 0.87762; logL = -1927.0387
## q[1] = 13.116; q[2] = 0.87862; logL = -1926.4874
## q[1] = 13.116; q[2] = 0.87662; logL = -1927.6816
## q[1] = 3.9157; q[2] = 0.97055; logL = -1323.1261
## q[1] = 3.9167; q[2] = 0.97055; logL = -1323.1988
## q[1] = 3.9147; q[2] = 0.97055; logL = -1323.0533
## q[1] = 3.9157; q[2] = 0.97155; logL = -1322.8752
## q[1] = 3.9157; q[2] = 0.96955; logL = -1323.3775
## q[1] = 2.0027; q[2] = 0.98987; logL = -1213.548
## q[1] = 2.0037; q[2] = 0.98987; logL = -1213.5633
## q[1] = 2.0017; q[2] = 0.98987; logL = -1213.5328
## q[1] = 2.0027; q[2] = 0.99087; logL = -1213.5138
## q[1] = 2.0027; q[2] = 0.98887; logL = -1213.5826
## q[1] = 1.9741; q[2] = 1.0321; logL = -1212.2328
## q[1] = 1.9751; q[2] = 1.0321; logL = -1212.2387
## q[1] = 1.9731; q[2] = 1.0321; logL = -1212.2269
## q[1] = 1.9741; q[2] = 1.0331; logL = -1212.2191
## q[1] = 1.9741; q[2] = 1.0311; logL = -1212.2467
## q[1] = 1.9571; q[2] = 1.0611; logL = -1211.9706
## q[1] = 1.9581; q[2] = 1.0611; logL = -1211.9704
## q[1] = 1.9561; q[2] = 1.0611; logL = -1211.9709
## q[1] = 1.9571; q[2] = 1.0621; logL = -1211.9697
## q[1] = 1.9571; q[2] = 1.0601; logL = -1211.9719
## q[1] = 1.9613; q[2] = 1.0652; logL = -1211.9659
## q[1] = 1.9623; q[2] = 1.0652; logL = -1211.9654
## q[1] = 1.9603; q[2] = 1.0652; logL = -1211.9665
## q[1] = 1.9613; q[2] = 1.0662; logL = -1211.9657
## q[1] = 1.9613; q[2] = 1.0642; logL = -1211.9664
## q[1] = 1.9791; q[2] = 1.0764; logL = -1211.9573
## q[1] = 1.9801; q[2] = 1.0764; logL = -1211.9568
## q[1] = 1.9781; q[2] = 1.0764; logL = -1211.9579
## q[1] = 1.9791; q[2] = 1.0774; logL = -1211.958
## q[1] = 1.9791; q[2] = 1.0754; logL = -1211.9569
## q[1] = 1.991; q[2] = 1.0812; logL = -1211.9549
## q[1] = 1.992; q[2] = 1.0812; logL = -1211.9548
## q[1] = 1.99; q[2] = 1.0812; logL = -1211.9551
## q[1] = 1.991; q[2] = 1.0822; logL = -1211.9553
## q[1] = 1.991; q[2] = 1.0802; logL = -1211.9548
## q[1] = 1.993; q[2] = 1.0815; logL = -1211.9548
## q[1] = 1.994; q[2] = 1.0815; logL = -1211.9548
## q[1] = 1.992; q[2] = 1.0815; logL = -1211.9548
## q[1] = 1.993; q[2] = 1.0825; logL = -1211.955
## q[1] = 1.993; q[2] = 1.0805; logL = -1211.9549
## q[1] = 1.993; q[2] = 1.0814; logL = -1211.9548
## q[1] = 1.994; q[2] = 1.0814; logL = -1211.9548
## q[1] = 1.992; q[2] = 1.0814; logL = -1211.9548
## q[1] = 1.993; q[2] = 1.0824; logL = -1211.9549
## q[1] = 1.993; q[2] = 1.0804; logL = -1211.9549
fit
## $par
## [1] 1.992985 1.081415
##
## $value
## [1] 1211.955
##
## $counts
## function gradient
## 13 13
##
## $convergence
## [1] 0
##
## $message
## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
Remembering that to simulate our data, we set Q1,2 to be 2.0 and Q2,1 to be 1.0. Our estimates turn out to be really very close.
Now let's compare to ratebytree
as follows:
ratebytree(rep(tree,20),lapply(as.list(X),setNames,rownames(X)),
model="ARD")
## ML common-rate model:
## 1->0 0->1 k logL
## value 1.0812 1.9929 2 -1211.9548
##
## Model fitting method was "optim".
##
## ML multi-rate model:
## 1->0 0->1 k logL
## tree1 0.6997 1.4017 40 -1200.0165
## tree2 1.007 1.431
## tree3 0.9505 1.6508
## tree4 1.0151 2.0233
## tree5 1.0356 2.6362
## tree6 1.1637 1.7085
## tree7 1.2568 3.7405
## tree8 1.536 2.5116
## tree9 1.3073 2.483
## tree10 0.5318 1.0522
## tree11 1.0557 2.4955
## tree12 0.9176 1.4706
## tree13 1.777 3.7395
## tree14 0.9998 1.6659
## tree15 1.2626 2.1373
## tree16 1.4788 1.9684
## tree17 0.588 1.1634
## tree18 2.4303 4.0818
## tree19 1.0434 2.1037
## tree20 2.4274 4.146
##
## Model fitting method was "nlminb".
##
## Likelihood ratio: 23.8765
## P-value (based on X^2): 0.9641
Neat. This also tells us how varied our fitted model parameter estimated would have been if we fit our Mk model to any of these 20 trait vectors individually!