Wednesday, July 31, 2019

Stochastic character mapping with polymorphic characters using make.simmap and fitpolyMk in phytools

I recently received an email with the following inquiry:

“I wondered if you had any insight or advice on how the make.simmap function in phytools works when the input data have multiple rows per species? It certainly does run but I am unclear on how example it's dealing with this. Specifically, I'm working with a dataset of a 3-state categorical trait but for which I have between 1 and 42 (mean 3) datapoints per species …. I initially fed in the trait data directly to make.simmap, and it did run fine and give reasonable looking estimates for internal nodes. However, it also allocated a single state (without exception) to the tips, so I'm not sure if this is choosing the modal value, the first value, or something else here?

As an alternative, I converted the raw data to a matrix of proportions of each state per species (in a similar manner as incorporating uncertainty in tip state as you described here) based on the observed proportions for each species. This gives very similar estimates for ancestral states, perhaps within the bounds of the inherent stochasticity of the method, though different enough in places that I was suspicious that it's doing something different from my first approach. It also (of course) gives tip states that are as variable as the data it derived from, as you'd expect. However I wonder whether this approach, which essentially equates uncertainty in tip state with intraspecific variability, is a valid way of treating this data.

In summary, do you have anything you can add concerning the appropriateness of either of these approaches (or any other suggestions) for dealing with N-state categorical traits which are variable within species?”

First - in the former case make.simmap is most definitely not taking intraspecific polymorphism into account. Rather, as the user suspected, it is most-likely just using the first observation for each species.

The second alternative is more interesting. Although this may seem reasonable it is also not correct. Saying that a particular tip taxa has a prior probability of being in state A that is P(A)=0.50 and P(B)=0.50 implies that the taxon could be either state A or B with equal probability - not that (say) 50% of the population is in state A & the other 50% in state B.

Though this may be somewhat unsatisfying, the simplest way to deal with intraspecific polymorphism is to re-code our character as an additional state. That is, instead of having character states A and B, we instead have A, A+B, and B.

Unfortunately, phytools::make.simmap presently doesn't automatically know how to handle data in this form. The function fitpolyMk does, however, and it is relatively straightforward to integrate it into make.simmap. I'll next demonstrate how.

Note that I just pushed a very small update to fitpolyMk to make this easier - thus to follow this demo exactly it is necessary to update phytools from GitHub.

Let's get some data & a tree for our demo. Here are mine:

library(phytools)
packageVersion("phytools")
## [1] '0.7.2'
poly.data<-read.csv(file="http://www.phytools.org/TS2019/ex/4c/polymorphic-data.csv",
    row.names=1)
y<-setNames(poly.data[,1],rownames(poly.data))
y
##  t70  t71  t47 t169 t170  t67  t68  t62  t59  t60 t173 t174  t19  t20  t21 
##  C+D  C+D  C+D  C+D  C+D  C+D  B+C  C+D    C    C    B    B    B  B+C    B 
##  t15  t50 t109 t123 t124 t104 t105  t52  t53 t167 t168 t141  t74  t66  t48 
##    B  B+C  B+C  B+C  B+C  B+C  B+C  B+C  B+C    C    C    C    C  C+D    C 
## t181 t182   t2  t39  t40   t4  t27 t191 t192  t28 t145 t146  t77   t8 t193 
##  C+D  C+D    C    C    C    D    C  B+C    C    D  C+D  C+D  C+D    D    D 
## t194  t96  t45  t86  t87  t13   t6 t125 t183 t184  t16  t17 t135 t136 t147 
##    D    D    D    D    D  C+D  C+D  C+D    C    C    C    C    C    C    C 
## t148  t83  t88  t89  t78  t79  t24 t187 t188  t36  t29  t99 t189 t190 t115 
##    C    C  B+C    C    C    C    C    C    C    C    D  C+D    C    C    C 
## t119 t120 t121 t122  t46  t49 t165 t166 t112   t5 t106 t126 t142 t143  t98 
##    C    C    C    C    C    C  C+D  C+D  C+D  B+C  B+C  B+C  B+C  B+C    C 
## t179 t180 t144 t113 t114  t18  t30 t155 t156  t11 t128 t129 t127  t64  t65 
##  B+C  B+C  B+C  B+C  B+C  B+C    C    C    C  B+C    C  C+D    C  C+D  C+D 
## t153 t154 t185 t186  t80  t81  t33  t43 t151 t152  t82  t97 t132 t133  t61 
##    C    C    C    C    B    B    C    D  C+D  C+D  C+D    D    D    D    D 
##   t3 t102 t103  t35  t69 t149 t150   t7  t92  t93 t110 t111 t197 t198 t161 
##  A+B  B+C  B+C  B+C  A+B    B    B  C+D  C+D  C+D  C+D  C+D    C    C    C 
## t157 t158  t12   t9  t41  t42  t25  t37  t38  t26  t10  t90  t94  t95  t54 
##    C    C  B+C    C    C    C    C    C    C    C  B+C  C+D  C+D  C+D    C 
##  t55  t91 t171 t172 t162 t163  t51  t32 t177 t178  t63  t84  t85  t44  t56 
##    C    D    C    C  C+D  C+D  C+D    C    C    C  B+C    C    C    C    C 
## t199 t200 t107 t108 t130 t131  t31 t137 t138  t22 t139 t140 t118  t72  t73 
##    C    C    C    C    C    C    C  C+D    C    C  C+D  C+D  C+D  C+D    C 
##  t76 t159 t160  t14  t75 t134 t195 t196 t164  t34 t100 t101 t116 t117  t23 
##  C+D  C+D  C+D    D  C+D  C+D  C+D  C+D  C+D    D    D    D  C+D    C  C+D 
## t175 t176  t57  t58   t1 
##  C+D  C+D  B+C  B+C  C+D 
## Levels: A+B B B+C C C+D D
poly.tree<-read.tree(file="http://www.phytools.org/TS2019/ex/4c/polymorphic-tree.phy")

Just for fun - and to show ourselves that fitpolyMk can handle it - let's flip the order of some of our polymorphic conditions. In other words, we will make state A + B a mixture of taxa coded as A+B and B+A and so on.

y<-setNames(as.character(y),names(y))
ii<-which(y=="A+B")
y[sample(ii,round(0.5*length(ii)))]<-"B+A"
ii<-which(y=="C+D")
y[sample(ii,round(0.5*length(ii)))]<-"D+C"
y<-as.factor(y)
y
##  t70  t71  t47 t169 t170  t67  t68  t62  t59  t60 t173 t174  t19  t20  t21 
##  D+C  C+D  C+D  C+D  D+C  C+D  B+C  C+D    C    C    B    B    B  B+C    B 
##  t15  t50 t109 t123 t124 t104 t105  t52  t53 t167 t168 t141  t74  t66  t48 
##    B  B+C  B+C  B+C  B+C  B+C  B+C  B+C  B+C    C    C    C    C  C+D    C 
## t181 t182   t2  t39  t40   t4  t27 t191 t192  t28 t145 t146  t77   t8 t193 
##  C+D  D+C    C    C    C    D    C  B+C    C    D  D+C  C+D  D+C    D    D 
## t194  t96  t45  t86  t87  t13   t6 t125 t183 t184  t16  t17 t135 t136 t147 
##    D    D    D    D    D  C+D  C+D  C+D    C    C    C    C    C    C    C 
## t148  t83  t88  t89  t78  t79  t24 t187 t188  t36  t29  t99 t189 t190 t115 
##    C    C  B+C    C    C    C    C    C    C    C    D  C+D    C    C    C 
## t119 t120 t121 t122  t46  t49 t165 t166 t112   t5 t106 t126 t142 t143  t98 
##    C    C    C    C    C    C  D+C  C+D  C+D  B+C  B+C  B+C  B+C  B+C    C 
## t179 t180 t144 t113 t114  t18  t30 t155 t156  t11 t128 t129 t127  t64  t65 
##  B+C  B+C  B+C  B+C  B+C  B+C    C    C    C  B+C    C  D+C    C  C+D  C+D 
## t153 t154 t185 t186  t80  t81  t33  t43 t151 t152  t82  t97 t132 t133  t61 
##    C    C    C    C    B    B    C    D  C+D  D+C  D+C    D    D    D    D 
##   t3 t102 t103  t35  t69 t149 t150   t7  t92  t93 t110 t111 t197 t198 t161 
##  A+B  B+C  B+C  B+C  B+A    B    B  D+C  D+C  D+C  D+C  D+C    C    C    C 
## t157 t158  t12   t9  t41  t42  t25  t37  t38  t26  t10  t90  t94  t95  t54 
##    C    C  B+C    C    C    C    C    C    C    C  B+C  D+C  C+D  C+D    C 
##  t55  t91 t171 t172 t162 t163  t51  t32 t177 t178  t63  t84  t85  t44  t56 
##    C    D    C    C  D+C  C+D  D+C    C    C    C  B+C    C    C    C    C 
## t199 t200 t107 t108 t130 t131  t31 t137 t138  t22 t139 t140 t118  t72  t73 
##    C    C    C    C    C    C    C  D+C    C    C  D+C  C+D  C+D  D+C    C 
##  t76 t159 t160  t14  t75 t134 t195 t196 t164  t34 t100 t101 t116 t117  t23 
##  D+C  C+D  D+C    D  C+D  C+D  C+D  C+D  D+C    D    D    D  D+C    C  D+C 
## t175 t176  t57  t58   t1 
##  D+C  D+C  B+C  B+C  D+C 
## Levels: A+B B B+A B+C C C+D D D+C

Plot our data:

plotTree(poly.tree,ftype="off",lwd=1,type="fan")
X<-strsplit(setNames(as.character(y),names(y)),"+",fixed=TRUE)
pies<-matrix(0,Ntip(poly.tree),4,dimnames=list(poly.tree$tip.label,
    c("A","B","C","D")))
for(i in 1:Ntip(poly.tree)) 
    pies[poly.tree$tip.label[i],X[[poly.tree$tip.label[i]]]]<-
        rep(1/length(X[[poly.tree$tip.label[i]]]),
        length(X[[poly.tree$tip.label[i]]]))
tiplabels(pie=pies,piecol=c("black","yellow","red","blue"),
    cex=0.35)
legend(x="topleft",legend=c("A","B","C","D"),pt.cex=2,pch=21,
    pt.bg=c("black","yellow","red","blue"))

plot of chunk unnamed-chunk-3

Now we'll go ahead & fit our polymorphic evolution model. Here, I'm going to use an symmetric-rates ordered model - just because I happen to know that this model has good fit to the data:

fit<-fitpolyMk(poly.tree,y,model="SYM",ordered=TRUE)
## 
## This is the design matrix of the fitted model. Does it make sense?
## 
##     A A+B B B+C C C+D D
## A   0   1 0   0 0   0 0
## A+B 1   0 2   0 0   0 0
## B   0   2 0   3 0   0 0
## B+C 0   0 3   0 4   0 0
## C   0   0 0   4 0   5 0
## C+D 0   0 0   0 5   0 6
## D   0   0 0   0 0   6 0
plot(fit,lwd=2)

plot of chunk unnamed-chunk-4

Now, how can we apply this to stochastic character mapping? Well, this is easier than you'd think simply because the function fitpolyMk (now) returns both the design matrix of the model, and a data matrix of the input data to the user as components of the fitted model object: $index.matrix and $data, respectively. Let's examine these & then use these as follows:

fit$index.matrix
##     A A+B B B+C C C+D D
## A   0   1 0   0 0   0 0
## A+B 1   0 2   0 0   0 0
## B   0   2 0   3 0   0 0
## B+C 0   0 3   0 4   0 0
## C   0   0 0   4 0   5 0
## C+D 0   0 0   0 5   0 6
## D   0   0 0   0 0   6 0
fit$data
##      A A+B B B+C C C+D D
## t70  0   0 0   0 0   1 0
## t71  0   0 0   0 0   1 0
## t47  0   0 0   0 0   1 0
## t169 0   0 0   0 0   1 0
## t170 0   0 0   0 0   1 0
## t67  0   0 0   0 0   1 0
## t68  0   0 0   1 0   0 0
## t62  0   0 0   0 0   1 0
## t59  0   0 0   0 1   0 0
## t60  0   0 0   0 1   0 0
## t173 0   0 1   0 0   0 0
## t174 0   0 1   0 0   0 0
## t19  0   0 1   0 0   0 0
## t20  0   0 0   1 0   0 0
## t21  0   0 1   0 0   0 0
## t15  0   0 1   0 0   0 0
## t50  0   0 0   1 0   0 0
## t109 0   0 0   1 0   0 0
## t123 0   0 0   1 0   0 0
## t124 0   0 0   1 0   0 0
## t104 0   0 0   1 0   0 0
## t105 0   0 0   1 0   0 0
## t52  0   0 0   1 0   0 0
## t53  0   0 0   1 0   0 0
## t167 0   0 0   0 1   0 0
## t168 0   0 0   0 1   0 0
## t141 0   0 0   0 1   0 0
## t74  0   0 0   0 1   0 0
## t66  0   0 0   0 0   1 0
## t48  0   0 0   0 1   0 0
## t181 0   0 0   0 0   1 0
## t182 0   0 0   0 0   1 0
## t2   0   0 0   0 1   0 0
## t39  0   0 0   0 1   0 0
## t40  0   0 0   0 1   0 0
## t4   0   0 0   0 0   0 1
## t27  0   0 0   0 1   0 0
## t191 0   0 0   1 0   0 0
## t192 0   0 0   0 1   0 0
## t28  0   0 0   0 0   0 1
## t145 0   0 0   0 0   1 0
## t146 0   0 0   0 0   1 0
## t77  0   0 0   0 0   1 0
## t8   0   0 0   0 0   0 1
## t193 0   0 0   0 0   0 1
## t194 0   0 0   0 0   0 1
## t96  0   0 0   0 0   0 1
## t45  0   0 0   0 0   0 1
## t86  0   0 0   0 0   0 1
## t87  0   0 0   0 0   0 1
## t13  0   0 0   0 0   1 0
## t6   0   0 0   0 0   1 0
## t125 0   0 0   0 0   1 0
## t183 0   0 0   0 1   0 0
## t184 0   0 0   0 1   0 0
## t16  0   0 0   0 1   0 0
## t17  0   0 0   0 1   0 0
## t135 0   0 0   0 1   0 0
## t136 0   0 0   0 1   0 0
## t147 0   0 0   0 1   0 0
## t148 0   0 0   0 1   0 0
## t83  0   0 0   0 1   0 0
## t88  0   0 0   1 0   0 0
## t89  0   0 0   0 1   0 0
## t78  0   0 0   0 1   0 0
## t79  0   0 0   0 1   0 0
## t24  0   0 0   0 1   0 0
## t187 0   0 0   0 1   0 0
## t188 0   0 0   0 1   0 0
## t36  0   0 0   0 1   0 0
## t29  0   0 0   0 0   0 1
## t99  0   0 0   0 0   1 0
## t189 0   0 0   0 1   0 0
## t190 0   0 0   0 1   0 0
## t115 0   0 0   0 1   0 0
## t119 0   0 0   0 1   0 0
## t120 0   0 0   0 1   0 0
## t121 0   0 0   0 1   0 0
## t122 0   0 0   0 1   0 0
## t46  0   0 0   0 1   0 0
## t49  0   0 0   0 1   0 0
## t165 0   0 0   0 0   1 0
## t166 0   0 0   0 0   1 0
## t112 0   0 0   0 0   1 0
## t5   0   0 0   1 0   0 0
## t106 0   0 0   1 0   0 0
## t126 0   0 0   1 0   0 0
## t142 0   0 0   1 0   0 0
## t143 0   0 0   1 0   0 0
## t98  0   0 0   0 1   0 0
## t179 0   0 0   1 0   0 0
## t180 0   0 0   1 0   0 0
## t144 0   0 0   1 0   0 0
## t113 0   0 0   1 0   0 0
## t114 0   0 0   1 0   0 0
## t18  0   0 0   1 0   0 0
## t30  0   0 0   0 1   0 0
## t155 0   0 0   0 1   0 0
## t156 0   0 0   0 1   0 0
## t11  0   0 0   1 0   0 0
## t128 0   0 0   0 1   0 0
## t129 0   0 0   0 0   1 0
## t127 0   0 0   0 1   0 0
## t64  0   0 0   0 0   1 0
## t65  0   0 0   0 0   1 0
## t153 0   0 0   0 1   0 0
## t154 0   0 0   0 1   0 0
## t185 0   0 0   0 1   0 0
## t186 0   0 0   0 1   0 0
## t80  0   0 1   0 0   0 0
## t81  0   0 1   0 0   0 0
## t33  0   0 0   0 1   0 0
## t43  0   0 0   0 0   0 1
## t151 0   0 0   0 0   1 0
## t152 0   0 0   0 0   1 0
## t82  0   0 0   0 0   1 0
## t97  0   0 0   0 0   0 1
## t132 0   0 0   0 0   0 1
## t133 0   0 0   0 0   0 1
## t61  0   0 0   0 0   0 1
## t3   0   1 0   0 0   0 0
## t102 0   0 0   1 0   0 0
## t103 0   0 0   1 0   0 0
## t35  0   0 0   1 0   0 0
## t69  0   1 0   0 0   0 0
## t149 0   0 1   0 0   0 0
## t150 0   0 1   0 0   0 0
## t7   0   0 0   0 0   1 0
## t92  0   0 0   0 0   1 0
## t93  0   0 0   0 0   1 0
## t110 0   0 0   0 0   1 0
## t111 0   0 0   0 0   1 0
## t197 0   0 0   0 1   0 0
## t198 0   0 0   0 1   0 0
## t161 0   0 0   0 1   0 0
## t157 0   0 0   0 1   0 0
## t158 0   0 0   0 1   0 0
## t12  0   0 0   1 0   0 0
## t9   0   0 0   0 1   0 0
## t41  0   0 0   0 1   0 0
## t42  0   0 0   0 1   0 0
## t25  0   0 0   0 1   0 0
## t37  0   0 0   0 1   0 0
## t38  0   0 0   0 1   0 0
## t26  0   0 0   0 1   0 0
## t10  0   0 0   1 0   0 0
## t90  0   0 0   0 0   1 0
## t94  0   0 0   0 0   1 0
## t95  0   0 0   0 0   1 0
## t54  0   0 0   0 1   0 0
## t55  0   0 0   0 1   0 0
## t91  0   0 0   0 0   0 1
## t171 0   0 0   0 1   0 0
## t172 0   0 0   0 1   0 0
## t162 0   0 0   0 0   1 0
## t163 0   0 0   0 0   1 0
## t51  0   0 0   0 0   1 0
## t32  0   0 0   0 1   0 0
## t177 0   0 0   0 1   0 0
## t178 0   0 0   0 1   0 0
## t63  0   0 0   1 0   0 0
## t84  0   0 0   0 1   0 0
## t85  0   0 0   0 1   0 0
## t44  0   0 0   0 1   0 0
## t56  0   0 0   0 1   0 0
## t199 0   0 0   0 1   0 0
## t200 0   0 0   0 1   0 0
## t107 0   0 0   0 1   0 0
## t108 0   0 0   0 1   0 0
## t130 0   0 0   0 1   0 0
## t131 0   0 0   0 1   0 0
## t31  0   0 0   0 1   0 0
## t137 0   0 0   0 0   1 0
## t138 0   0 0   0 1   0 0
## t22  0   0 0   0 1   0 0
## t139 0   0 0   0 0   1 0
## t140 0   0 0   0 0   1 0
## t118 0   0 0   0 0   1 0
## t72  0   0 0   0 0   1 0
## t73  0   0 0   0 1   0 0
## t76  0   0 0   0 0   1 0
## t159 0   0 0   0 0   1 0
## t160 0   0 0   0 0   1 0
## t14  0   0 0   0 0   0 1
## t75  0   0 0   0 0   1 0
## t134 0   0 0   0 0   1 0
## t195 0   0 0   0 0   1 0
## t196 0   0 0   0 0   1 0
## t164 0   0 0   0 0   1 0
## t34  0   0 0   0 0   0 1
## t100 0   0 0   0 0   0 1
## t101 0   0 0   0 0   0 1
## t116 0   0 0   0 0   1 0
## t117 0   0 0   0 1   0 0
## t23  0   0 0   0 0   1 0
## t175 0   0 0   0 0   1 0
## t176 0   0 0   0 0   1 0
## t57  0   0 0   1 0   0 0
## t58  0   0 0   1 0   0 0
## t1   0   0 0   0 0   1 0
simmap.trees<-make.simmap(poly.tree,fit$data,model=fit$index.matrix,nsim=100)
## make.simmap is sampling character histories conditioned on the transition matrix
## 
## Q =
##     A       A+B         B       B+C         C       C+D        D
## A   0  0.000000  0.000000  0.000000  0.000000  0.000000  0.00000
## A+B 0 -1.271756  1.271756  0.000000  0.000000  0.000000  0.00000
## B   0  1.271756 -2.333135  1.061379  0.000000  0.000000  0.00000
## B+C 0  0.000000  1.061379 -1.768122  0.706743  0.000000  0.00000
## C   0  0.000000  0.000000  0.706743 -1.913028  1.206285  0.00000
## C+D 0  0.000000  0.000000  0.000000  1.206285 -2.252835  1.04655
## D   0  0.000000  0.000000  0.000000  0.000000  1.046550 -1.04655
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
##         A       A+B         B       B+C         C       C+D         D 
## 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571 0.1428571
## Done.
simmap.trees
## 100 phylogenetic trees with mapped discrete characters

We can plot a summary of this analysis as follows:

cols<-setNames(colorRampPalette(c("black","yellow","red","blue"))(7),
    colnames(fit$data))
plot(summary(simmap.trees),colors=cols,type="fan",ftype="off")
legend(x="topleft",legend=colnames(fit$data),pt.cex=2.4,pch=21,
    pt.bg=cols)

plot of chunk unnamed-chunk-6

However, it might make more sense to do things like combine states before computing posterior probabilities at node. The following is just an example that combines A+B, B, and B+C and then plots them compared to a new state not-B:

b.trees<-mergeMappedStates(simmap.trees,c("A+B","B","B+C"),"B")
b.trees<-mergeMappedStates(b.trees,c("A","C","C+D","D"),"not-B")
cols<-setNames(c("black","yellow"),c("not-B","B"))
plot(summary(b.trees),colors=cols,type="fan",ftype="off")
legend(x="topleft",legend=names(cols),pt.cex=2.4,pch=21,
    pt.bg=cols)

plot of chunk unnamed-chunk-7

Or, similarly using densityMap:

object<-densityMap(b.trees,states=names(cols),plot=FALSE)
## sorry - this might take a while; please be patient
object<-setMap(object,c("black","yellow"))
plot(object,lwd=c(2,6),ftype="off")

plot of chunk unnamed-chunk-8

That's kind of the idea. We can similarly analyze our posterior sample of trees in all of the usual ways, e.g.:

object<-density(simmap.trees)
object
## 
## Distribution of changes from stochastic mapping:
## 
##  A+B->B      A+B->B+C
##  Min.   :0   Min.   :0
##  Median :0   Median :0
##  Mean   :0.34    Mean   :0
##  Max.   :2   Max.   :0
## 
##  A+B->C      A+B->C+D
##  Min.   :0   Min.   :0
##  Median :0   Median :0
##  Mean   :0   Mean   :0
##  Max.   :0   Max.   :0
## 
##  A+B->D      B->A+B
##  Min.   :0   Min.   :2
##  Median :0   Median :2
##  Mean   :0   Mean   :2.29
##  Max.   :0   Max.   :4
## 
##  B->B+C      B->C
##  Min.   :0   Min.   :0
##  Median :1.5 Median :0
##  Mean   :1.85    Mean   :0
##  Max.   :6   Max.   :0
## 
##  B->C+D      B->D
##  Min.   :0   Min.   :0
##  Median :0   Median :0
##  Mean   :0   Mean   :0
##  Max.   :0   Max.   :0
## 
##  B+C->A+B        B+C->B
##  Min.   :0   Min.   :4
##  Median :0   Median :7
##  Mean   :0   Mean   :6.67
##  Max.   :0   Max.   :10
## 
##  B+C->C      B+C->C+D
##  Min.   :1   Min.   :0
##  Median :3   Median :0
##  Mean   :3.54    Mean   :0
##  Max.   :9   Max.   :0
## 
##  B+C->D      C->A+B
##  Min.   :0   Min.   :0
##  Median :0   Median :0
##  Mean   :0   Mean   :0
##  Max.   :0   Max.   :0
## 
##  C->B        C->B+C
##  Min.   :0   Min.   :13
##  Median :0   Median :15
##  Mean   :0   Mean   :15.49
##  Max.   :0   Max.   :21
## 
##  C->C+D      C->D
##  Min.   :18  Min.   :0
##  Median :24  Median :0
##  Mean   :23.9    Mean   :0
##  Max.   :30  Max.   :0
## 
##  C+D->A+B        C+D->B
##  Min.   :0   Min.   :0
##  Median :0   Median :0
##  Mean   :0   Mean   :0
##  Max.   :0   Max.   :0
## 
##  C+D->B+C        C+D->C
##  Min.   :0   Min.   :8
##  Median :0   Median :14
##  Mean   :0   Mean   :13.68
##  Max.   :0   Max.   :26
## 
##  C+D->D      D->A+B
##  Min.   :7   Min.   :0
##  Median :11  Median :0
##  Mean   :10.64   Mean   :0
##  Max.   :15  Max.   :0
## 
##  D->B        D->B+C
##  Min.   :0   Min.   :0
##  Median :0   Median :0
##  Mean   :0   Mean   :0
##  Max.   :0   Max.   :0
## 
##  D->C        D->C+D
##  Min.   :0   Min.   :0
##  Median :0   Median :3
##  Mean   :0   Mean   :3.51
##  Max.   :0   Max.   :9
## 
## 95% HPD interval(A+B->B): [0, 2]
## 95% HPD interval(A+B->B+C): [0, 0]
## 95% HPD interval(A+B->C): [0, 0]
## 95% HPD interval(A+B->C+D): [0, 0]
## 95% HPD interval(A+B->D): [0, 0]
## 95% HPD interval(B->A+B): [2, 4]
## 95% HPD interval(B->B+C): [0, 5]
## 95% HPD interval(B->C): [0, 0]
## 95% HPD interval(B->C+D): [0, 0]
## 95% HPD interval(B->D): [0, 0]
## 95% HPD interval(B+C->A+B): [0, 0]
## 95% HPD interval(B+C->B): [5, 9]
## 95% HPD interval(B+C->C): [1, 7]
## 95% HPD interval(B+C->C+D): [0, 0]
## 95% HPD interval(B+C->D): [0, 0]
## 95% HPD interval(C->A+B): [0, 0]
## 95% HPD interval(C->B): [0, 0]
## 95% HPD interval(C->B+C): [13, 18]
## 95% HPD interval(C->C+D): [20, 29]
## 95% HPD interval(C->D): [0, 0]
## 95% HPD interval(C+D->A+B): [0, 0]
## 95% HPD interval(C+D->B): [0, 0]
## 95% HPD interval(C+D->B+C): [0, 0]
## 95% HPD interval(C+D->C): [8, 19]
## 95% HPD interval(C+D->D): [8, 14]
## 95% HPD interval(D->A+B): [0, 0]
## 95% HPD interval(D->B): [0, 0]
## 95% HPD interval(D->B+C): [0, 0]
## 95% HPD interval(D->C): [0, 0]
## 95% HPD interval(D->C+D): [0, 7]
plot(object)

plot of chunk unnamed-chunk-9

Or from our B or not-B analysis:

object<-density(b.trees)
object
## 
## Distribution of changes from stochastic mapping:
##  B->not-B        not-B->B
##  Min.   :1   Min.   :13
##  Median :3   Median :15
##  Mean   :3.54    Mean   :15.49
##  Max.   :9   Max.   :21
## 
## 95% HPD interval(B->not-B): [1, 7]
## 95% HPD interval(not-B->B): [13, 18]
par(family="mono")
plot(object)

plot of chunk unnamed-chunk-10

You get the idea.

The very heterogeneous sampling (“between 1 and 42 (mean 3) datapoints per species”) raises another interesting point. What if we only have one sample for a species? How can we genuinely say that it is monomorphic? The truth is - we can't; however, if we have a model for our uncertainty we can easily incorporate it in the same way that we specify uncertainty in the values of monomorphic tip states (as previously discussed).

That's all I have to say about this - for now.

11 comments:

  1. Hi Dr. Revell,

    Love all your blogs - very clear and informative! I appreciate it a lot.

    However, when I was learning this tutorial, I came across some problem with applying the fitted model to the data.
    After assigning "fit", you used "fit$data" to call out the data matrix of the input data. But when I run "fit$data" (I was using your provided inputs) in my RStudio, it returned "NULL" rather than a data frame which you show in the tutorial.

    I was wondering if this issue is caused by my phytools version (which is 0.6.99). If so, how can I update my package to 0.7.2 as my phytools shown to be up-to-date.

    Thank you so much for your great help!!

    Gratefully,
    Laura

    ReplyDelete
  2. Dear Liam J. Revell

    I follow your blog (http://blog.phytools.org) and learned a lot about Comparative Biology in the phytools R package.

    You provide a very detailed script. However, when I applied the fit using fit$data I received the message "Null" at the R prompt instead of a data frame that you show in the tutorial. From that point I could not perform the analysis.
    I was wondering if this issue is caused by my phytools version (which is 0.6.99). If so, how can I update my package to 0.7.2 as my phytools shown to be up-to-date.

    Thank you so much for your great help!!

    Gratefully,

    Washington Vieira

    ReplyDelete
  3. Dear Liam J. Revell

    I follow your blog (http://blog.phytools.org) and learned a lot about Comparative Biology in the phytools R package.

    You provide a very detailed script. However, when I applied the fit using fit$data I received the message "Null" at the R prompt instead of a data frame that you show in the tutorial. From that point I could not perform the analysis.
    I was wondering if this issue is caused by my phytools version (which is 0.6.99). If so, how can I update my package to 0.7.2 as my phytools shown to be up-to-date.

    Thank you so much for your great help!!

    Gratefully,

    Washington Vieira

    ReplyDelete
    Replies
    1. After installing the CRAN package 'devtools' this can be done in a fresh R session as follows:

      library(devtools)
      install_github("liamrevell/phytools")

      All the best, Liam

      Delete
  4. Dear Liam,

    Thank you for this very clear and useful post.
    I have a question regarding the incorporation of uncertainty for some tips with the fitpolyMk function. Although the conversion of the character states to a matrix with the prior probabilities of each tip works well with the fitMk function, I cannot run it with the fitpolyMk (I got an error message: 'Error in strsplit(x, "+", fixed = TRUE) : non-character argument'). I was therefore wondering whether it is possible to combine the intraspecific polymorphism with character state uncertainty.
    Best regards,
    Alex

    ReplyDelete
    Replies
    1. I also had this error and I removed it by transforming my data with the lines that are in the Liams code:
      y2<-setNames(data[,1],rownames(data))
      y2<-as.factor(y2)

      Delete
  5. Hi,
    the code is very useful, there are not many tools for polymorphic data. I have, however, a problem with my data: at the end I get an error '1:nrow(L)':argument of length 0.
    I have tried different models (ER, ARD, SYM) and it is still the same. I think this my be related to rates values, since in each model my rates are NaNs.
    Thank you so much for your great help.

    ReplyDelete
    Replies
    1. Update
      Problem solved after chagning "ordered=TRUE" to "ordered=FALSE".

      Delete
  6. Dear Sir,
    I'm a great admirer of your tutorial. However, I would like to know that, how can I do character mapping of outcome from BiSSE or MuSSE of real data in diveritree package. Is it possible to do in simmap or any other method kindly let me know.

    Sincerely,
    Debajyoti

    ReplyDelete
    Replies
    1. Dear Sir,

      I got some code edited version of make.simmap by Martha Liliana Serrano-Serrano https://github.com/phylolab/MarthaSerrano_scripts/blob/master/script_Serranoetal_2017.R
      Though I have not used yet, trying to understand.

      Thanks,
      Debajyoti

      Delete
  7. Dear Liam,

    thanks for the very cool tool phytools & all its functions! I recently used the function densityMap to visualize character changes for a binary trait, which worked fine. However, I was wondering if there is a way to also plot a timescale underneath the tree so that one can easily see when character changes happen? Unfortunately the usual commands to add a timescale (like axisPhylo) do not work since the outcome is not of class "Phylo". I would greatly appreciate your feedback! Thanks a lot!

    Best,
    Belinda

    ReplyDelete

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.