I just added a new method & function to the phytools package in the form of
fitpolyMk
.
This function fits an (extended) Mk model to character data with intraspecific polymorphism.
The general idea here is very simple. We just imagine that every character changes between state by going through an intermediate polymorphic condition. So, for instance, in a set of taxa with states A, B, and the polymorphic condition, A + B, we suppose that evolutionary transitions between these three possibilities much occur as follows: A ↔ A + B ↔ B. For more than two states we might further suppose that (for example) a transition from A + B to B + C would have to go through the intermediate condition of only B (i.e., A + B ↔ B ↔ B + C), and that transitions to (say) A + B + C could only occur in one step from A + B, A + C, or B + C.
In the function I have implemented the standard suite of models ("ER"
, "SYM"
, and "ARD"
), which
assume that all permitted transitions occur at the same rate ("ER"
), that backward & forward transitions occur
at the same rate ("SYM"
e.g., qA → A + B =
qA + B → A, and so on), or that all allowed transitions can occur at different rates ("ARD"
).
I also added a new model called "transient"
in which I assumed that the acquisition of polymorphism (e.g., A → A + B)
occurrs at one rate, q1, and its loss (e.g., A + B → A) at
another (q2). I called it the transient model because I imagined that under some circumstances polymorphism might
be evolutionarily more fleeting (or transient) than monomorphism. Finally, I also permit the user to specify that the model be 'ordered' (e.g., A ↔ A +
B ↔ B ↔ B + C ↔ C) or unordered. Only in the latter should multistate
polymorphism (e.g., A + B + C) occur.
The user is required to code the data in the normal way, merely separating each state in a polymorphic taxon by the
+
character.
Here's a quick example:
library(phytools)
packageVersion("phytools")
## [1] '0.6.67'
tree
##
## Phylogenetic tree with 200 tips and 199 internal nodes.
##
## Tip labels:
## t142, t143, t27, t28, t160, t161, ...
##
## Rooted; includes branch lengths.
x
## t142 t143 t27 t28 t160 t161 t141 t37 t59 t136 t137 t24 t18 t19 t102
## C C C C C C C B+C B+C C C B+C C B+C C
## t103 t30 t113 t114 t38 t7 t51 t52 t70 t71 t35 t149 t150 t83 t34
## C B B+C B+C B C B+C B+C B B B A+B A B A+B
## t21 t81 t82 t138 t171 t172 t179 t180 t156 t157 t151 t144 t127 t128 t3
## A B A+B B B B B B B+C B+C B+C B+C B+C B+C C
## t6 t31 t84 t85 t39 t25 t11 t29 t42 t183 t184 t124 t9 t68 t69
## C C C C C B+C C C C C C C B+C B B
## t129 t130 t121 t53 t109 t110 t89 t13 t44 t195 t196 t95 t10 t99 t105
## B B B B B B B A+B B B B B B B+C B
## t106 t60 t187 t188 t177 t178 t57 t12 t169 t170 t94 t131 t132 t40 t41
## B+C B B+C B+C B+C B+C C B B B B B B C C
## t8 t49 t50 t14 t162 t163 t93 t117 t118 t133 t134 t76 t17 t2 t165
## C C B+C C B B B B B B B B B B B
## t166 t139 t140 t173 t174 t152 t153 t4 t45 t46 t191 t192 t122 t123 t22
## B B B B B B B A+B B B C C B+C B B
## t23 t5 t20 t79 t80 t47 t48 t98 t189 t190 t185 t186 t63 t125 t126
## B B C C C B+C B+C B+C B+C B+C C C C C C
## t112 t36 t16 t87 t88 t1 t193 t194 t92 t62 t158 t159 t154 t155 t15
## C C C C B+C C B+C B+C B+C B B+C B+C B+C B+C C
## t54 t64 t65 t77 t78 t199 t200 t164 t32 t33 t43 t90 t91 t74 t75
## B+C B+C B+C B+C B+C B B B B+C B B+C B+C B+C B+C B+C
## t72 t73 t66 t58 t100 t101 t55 t56 t86 t181 t182 t135 t119 t120 t67
## B+C B+C B B+C B+C C C B+C B+C B+C B+C B+C B+C B+C B+C
## t175 t176 t107 t108 t197 t198 t167 t168 t111 t145 t146 t26 t104 t147 t148
## B+C B+C B A+B A+B A+B B B B+C B B+C B B B B
## t97 t115 t116 t96 t61
## B B+C B+C B B
## Levels: A A+B B B+C C
## plot our data:
plotTree(tree,ftype="off",lwd=1,type="fan")
X<-strsplit(setNames(as.character(x),names(x)),"+",fixed=TRUE)
pies<-matrix(0,Ntip(tree),3,dimnames=list(tree$tip.label,c("A","B","C")))
for(i in 1:Ntip(tree))
pies[tree$tip.label[i],X[[tree$tip.label[i]]]]<-rep(1/length(X[[tree$tip.label[i]]]),
length(X[[tree$tip.label[i]]]))
tiplabels(pie=pies,piecol=c("black","yellow","red"),cex=0.35)
legend(x="topleft",legend=c("A","B","C"),pt.cex=2,pch=21,
pt.bg=c("black","yellow","red"))
Now, let's fit a few different models:
ard.unordered<-fitpolyMk(tree,x,model="ARD")
##
## This is the design matrix of the fitted model. Does it make sense?
##
## A B C A+B A+C B+C A+B+C
## A 0 0 0 1 3 0 0
## B 0 0 0 5 0 7 0
## C 0 0 0 0 9 11 0
## A+B 2 6 0 0 0 0 13
## A+C 4 0 10 0 0 0 15
## B+C 0 8 12 0 0 0 17
## A+B+C 0 0 0 14 16 18 0
ard.unordered
## Object of class "fitpolyMk".
##
## Evolution was modeled as 'unordered' using the "ARD" model.
##
## Fitted (or set) value of Q:
## A B C A+B A+C B+C
## A -71.33397 0.000000 0.000000 71.333965 0.000000 0.000000
## B 0.00000 -1.872671 0.000000 1.006799 0.000000 0.865872
## C 0.00000 0.000000 -1.322289 0.000000 0.000000 1.322289
## A+B 20.05692 7.623430 0.000000 -27.680355 0.000000 0.000000
## A+C 0.00000 0.000000 0.000000 0.000000 -8.847803 0.000000
## B+C 0.00000 1.723192 1.658633 0.000000 0.000000 -3.381825
## A+B+C 0.00000 0.000000 0.000000 0.000000 0.000000 8.281669
## A+B+C
## A 0.000000
## B 0.000000
## C 0.000000
## A+B 0.000000
## A+C 8.847803
## B+C 0.000000
## A+B+C -8.281669
##
## Fitted (or set) value of pi:
## A B C A+B A+C B+C A+B+C
## 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571
##
## Log-likelihood: -154.755089
##
## Optimization method used was "nlminb"
ard.ordered<-fitpolyMk(tree,x,model="ARD",ordered=TRUE)
##
## This is the design matrix of the fitted model. Does it make sense?
##
## A A+B B B+C C
## A 0 1 0 0 0
## A+B 2 0 3 0 0
## B 0 4 0 5 0
## B+C 0 0 6 0 7
## C 0 0 0 8 0
ard.ordered
## Object of class "fitpolyMk".
##
## Evolution was modeled as 'ordered' (i.e., transitions are assumed
## to occur A <-> A+B <-> B and so on) using the "ARD" model.
##
## Fitted (or set) value of Q:
## A A+B B B+C C
## A -71.85153 71.851529 0.000000 0.000000 0.000000
## A+B 19.64117 -27.727843 8.086678 0.000000 0.000000
## B 0.00000 0.771046 -2.385605 1.614559 0.000000
## B+C 0.00000 0.000000 1.280922 -3.389113 2.108191
## C 0.00000 0.000000 0.000000 0.723070 -0.723070
##
## Fitted (or set) value of pi:
## A A+B B B+C C
## 0.2 0.2 0.2 0.2 0.2
##
## Log-likelihood: -154.966753
##
## Optimization method used was "nlminb"
sym.unordered<-fitpolyMk(tree,x,model="SYM")
##
## This is the design matrix of the fitted model. Does it make sense?
##
## A B C A+B A+C B+C A+B+C
## A 0 0 0 1 2 0 0
## B 0 0 0 3 0 4 0
## C 0 0 0 0 5 6 0
## A+B 1 3 0 0 0 0 7
## A+C 2 0 5 0 0 0 8
## B+C 0 4 6 0 0 0 9
## A+B+C 0 0 0 7 8 9 0
## Warning in log(comp[1:M + N]): NaNs produced
sym.unordered
## Object of class "fitpolyMk".
##
## Evolution was modeled as 'unordered' using the "SYM" model.
##
## Fitted (or set) value of Q:
## A B C A+B A+C B+C
## A -5.969063 0.000000 0.000000 5.969063 0.000000 0.000000
## B 0.000000 -2.081912 0.000000 0.713146 0.000000 1.368766
## C 0.000000 0.000000 -1.628471 0.000000 0.000000 1.628471
## A+B 5.969063 0.713146 0.000000 -6.682209 0.000000 0.000000
## A+C 0.000000 0.000000 0.000000 0.000000 -0.427551 0.000000
## B+C 0.000000 1.368766 1.628471 0.000000 0.000000 -2.997237
## A+B+C 0.000000 0.000000 0.000000 0.000000 0.427551 0.000000
## A+B+C
## A 0.000000
## B 0.000000
## C 0.000000
## A+B 0.000000
## A+C 0.427551
## B+C 0.000000
## A+B+C -0.427551
##
## Fitted (or set) value of pi:
## A B C A+B A+C B+C A+B+C
## 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571
##
## Log-likelihood: -160.029691
##
## Optimization method used was "nlminb"
sym.ordered<-fitpolyMk(tree,x,model="SYM",ordered=TRUE)
##
## This is the design matrix of the fitted model. Does it make sense?
##
## A A+B B B+C C
## A 0 1 0 0 0
## A+B 1 0 2 0 0
## B 0 2 0 3 0
## B+C 0 0 3 0 4
## C 0 0 0 4 0
sym.ordered
## Object of class "fitpolyMk".
##
## Evolution was modeled as 'ordered' (i.e., transitions are assumed
## to occur A <-> A+B <-> B and so on) using the "SYM" model.
##
## Fitted (or set) value of Q:
## A A+B B B+C C
## A -5.969319 5.969319 0.000000 0.000000 0.000000
## A+B 5.969319 -6.682465 0.713146 0.000000 0.000000
## B 0.000000 0.713146 -2.081906 1.368761 0.000000
## B+C 0.000000 0.000000 1.368761 -2.997227 1.628467
## C 0.000000 0.000000 0.000000 1.628467 -1.628467
##
## Fitted (or set) value of pi:
## A A+B B B+C C
## 0.2 0.2 0.2 0.2 0.2
##
## Log-likelihood: -159.693219
##
## Optimization method used was "nlminb"
transient.unordered<-fitpolyMk(tree,x,model="transient")
##
## This is the design matrix of the fitted model. Does it make sense?
##
## A B C A+B A+C B+C A+B+C
## A 0 0 0 2 2 0 0
## B 0 0 0 2 0 2 0
## C 0 0 0 0 2 2 0
## A+B 1 1 0 0 0 0 2
## A+C 1 0 1 0 0 0 2
## B+C 0 1 1 0 0 0 2
## A+B+C 0 0 0 1 1 1 0
transient.unordered
## Object of class "fitpolyMk".
##
## Evolution was modeled as 'unordered' using the "transient" model.
##
## Fitted (or set) value of Q:
## A B C A+B A+C B+C
## A -0.696158 0.000000 0.000000 0.348079 0.348079 0.000000
## B 0.000000 -0.696158 0.000000 0.348079 0.000000 0.348079
## C 0.000000 0.000000 -0.696158 0.000000 0.348079 0.348079
## A+B 1.572038 1.572038 0.000000 -3.492154 0.000000 0.000000
## A+C 1.572038 0.000000 1.572038 0.000000 -3.492154 0.000000
## B+C 0.000000 1.572038 1.572038 0.000000 0.000000 -3.492154
## A+B+C 0.000000 0.000000 0.000000 1.572038 1.572038 1.572038
## A+B+C
## A 0.000000
## B 0.000000
## C 0.000000
## A+B 0.348079
## A+C 0.348079
## B+C 0.348079
## A+B+C -4.716113
##
## Fitted (or set) value of pi:
## A B C A+B A+C B+C A+B+C
## 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571
##
## Log-likelihood: -169.815739
##
## Optimization method used was "nlminb"
transient.ordered<-fitpolyMk(tree,x,model="transient",ordered=TRUE)
##
## This is the design matrix of the fitted model. Does it make sense?
##
## A A+B B B+C C
## A 0 2 0 0 0
## A+B 1 0 1 0 0
## B 0 2 0 2 0
## B+C 0 0 1 0 1
## C 0 0 0 2 0
transient.ordered
## Object of class "fitpolyMk".
##
## Evolution was modeled as 'ordered' (i.e., transitions are assumed
## to occur A <-> A+B <-> B and so on) using the "transient" model.
##
## Fitted (or set) value of Q:
## A A+B B B+C C
## A -0.865778 0.865778 0.000000 0.000000 0.000000
## A+B 1.663303 -3.326606 1.663303 0.000000 0.000000
## B 0.000000 0.865778 -1.731555 0.865778 0.000000
## B+C 0.000000 0.000000 1.663303 -3.326606 1.663303
## C 0.000000 0.000000 0.000000 0.865778 -0.865778
##
## Fitted (or set) value of pi:
## A A+B B B+C C
## 0.2 0.2 0.2 0.2 0.2
##
## Log-likelihood: -158.782444
##
## Optimization method used was "nlminb"
er.unordered<-fitpolyMk(tree,x,model="ER")
##
## This is the design matrix of the fitted model. Does it make sense?
##
## A B C A+B A+C B+C A+B+C
## A 0 0 0 1 1 0 0
## B 0 0 0 1 0 1 0
## C 0 0 0 0 1 1 0
## A+B 1 1 0 0 0 0 1
## A+C 1 0 1 0 0 0 1
## B+C 0 1 1 0 0 0 1
## A+B+C 0 0 0 1 1 1 0
er.unordered
## Object of class "fitpolyMk".
##
## Evolution was modeled as 'unordered' using the "ER" model.
##
## Fitted (or set) value of Q:
## A B C A+B A+C B+C
## A -1.708008 0.000000 0.000000 0.854004 0.854004 0.000000
## B 0.000000 -1.708008 0.000000 0.854004 0.000000 0.854004
## C 0.000000 0.000000 -1.708008 0.000000 0.854004 0.854004
## A+B 0.854004 0.854004 0.000000 -2.562012 0.000000 0.000000
## A+C 0.854004 0.000000 0.854004 0.000000 -2.562012 0.000000
## B+C 0.000000 0.854004 0.854004 0.000000 0.000000 -2.562012
## A+B+C 0.000000 0.000000 0.000000 0.854004 0.854004 0.854004
## A+B+C
## A 0.000000
## B 0.000000
## C 0.000000
## A+B 0.854004
## A+C 0.854004
## B+C 0.854004
## A+B+C -2.562012
##
## Fitted (or set) value of pi:
## A B C A+B A+C B+C A+B+C
## 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571
##
## Log-likelihood: -183.822936
##
## Optimization method used was "nlminb"
er.ordered<-fitpolyMk(tree,x,model="ER",ordered=TRUE)
##
## This is the design matrix of the fitted model. Does it make sense?
##
## A A+B B B+C C
## A 0 1 0 0 0
## A+B 1 0 1 0 0
## B 0 1 0 1 0
## B+C 0 0 1 0 1
## C 0 0 0 1 0
er.ordered
## Object of class "fitpolyMk".
##
## Evolution was modeled as 'ordered' (i.e., transitions are assumed
## to occur A <-> A+B <-> B and so on) using the "ER" model.
##
## Fitted (or set) value of Q:
## A A+B B B+C C
## A -1.317534 1.317534 0.000000 0.000000 0.000000
## A+B 1.317534 -2.635067 1.317534 0.000000 0.000000
## B 0.000000 1.317534 -2.635067 1.317534 0.000000
## B+C 0.000000 0.000000 1.317534 -2.635067 1.317534
## C 0.000000 0.000000 0.000000 1.317534 -1.317534
##
## Fitted (or set) value of pi:
## A A+B B B+C C
## 0.2 0.2 0.2 0.2 0.2
##
## Log-likelihood: -161.927196
##
## Optimization method used was "nlminb"
We can compare the AIC values of our different models in a table:
data.frame(transition_model=c("ARD","ARD","SYM","SYM","transient","transient","ER","ER"),
ordered=c("no","yes","no","yes","no","yes","no","yes"),
logLik=c(logLik(ard.unordered),logLik(ard.ordered),logLik(sym.unordered),
logLik(sym.ordered),logLik(transient.unordered),logLik(transient.ordered),
logLik(er.unordered),logLik(er.ordered)),
k=c(attr(AIC(ard.unordered),"df"),attr(AIC(ard.ordered),"df"),
attr(AIC(sym.unordered),"df"),attr(AIC(sym.ordered),"df"),
attr(AIC(transient.unordered),"df"),attr(AIC(transient.ordered),"df"),
attr(AIC(er.unordered),"df"),attr(AIC(er.ordered),"df")),
AIC=c(AIC(ard.unordered),AIC(ard.ordered),AIC(sym.unordered),
AIC(sym.ordered),AIC(transient.unordered),AIC(transient.ordered),
AIC(er.unordered),AIC(er.ordered)))
## transition_model ordered logLik k AIC
## 1 ARD no -154.7551 18 345.5102
## 2 ARD yes -154.9668 8 325.9335
## 3 SYM no -160.0297 9 338.0594
## 4 SYM yes -159.6932 4 327.3864
## 5 transient no -169.8157 2 343.6315
## 6 transient yes -158.7824 2 321.5649
## 7 ER no -183.8229 1 369.6459
## 8 ER yes -161.9272 1 325.8544
This should tell us that the transient-ordered model is best-justified by the data, which is good - because this is the model I used to produce them:
tree<-pbtree(n=200,scale=1)
Q<-matrix(c(-1,1,0,0,0,
2,-4,2,0,0,
0,1,-2,1,0,
0,0,2,-4,2,
0,0,0,1,-1),5,5,byrow=TRUE)
rownames(Q)<-colnames(Q)<-c("A","A+B","B","B+C","C")
x<-sim.Mk(tree,Q)
That's it.
Hi Liam
ReplyDeleteHow can I use this function within make.simmap and can I use it with a probability matrix to include tips that are NA?
I have tried it with replacing fitMK in make.simmmap.R with fitpolyMK however, I either get "Error in getPars(bt, xx, model, Q = NULL, tree, tol, m, pi = pi, args = list(...)) :
could not find function "getPars""
if I only change the make.saimmap fucntion or, if I change all functions with make.simmap.R from thegithub repository to have at every occurence of fitMk fitpolyMK, I get the error " Error in strsplit(x, "+", fixed = TRUE) : non-character argument
5.
strsplit(x, "+", fixed = TRUE)
4.
(function (tree, x, model = "SYM", ordered = FALSE, ...)
{
if (hasArg(quiet))
quiet <- list(...)$quiet ...
3.
do.call(fitpolyMk, args)
2.
getPars(bt, xx, model, Q = NULL, tree, tol, m, pi = pi, args = list(...))
1.
make.simmap(pruned.tree, mymatrix_2, model = "ARD", nsim = 10,
tips = T) "
Thanks for any advice on how to implement it into make.simmap, this is exactly what I was looking for as a function