Stimulated by a user inquiry, here is a quick re-hash of ancestral character
estimation using the phytools function `make.simmap`

when tip
states are uncertain.

Firstly, let's simulate some data with the property of uncertainty in the tip
values - that is, we do not know the states for some extant taxa in the tree.
The way this is expressed is as a prior probability distribution on the states
for those tips that we do not know. For instance, if our character has two
states `"a"`

& `"b"`

, and we are completely ignorant of
the values of certain species in the tree, then we might say that the prior
probability of being in each of states `"a"`

or `"b"`

was `0.5`

.

```
## load phytools
library(phytools)
## simulate stochastic pure-birth tree
tree<-pbtree(n=26,tip.label=LETTERS,scale=1)
## generate character transition matrix
Q<-matrix(c(-1,1,1,-1),2,2)
rownames(Q)<-colnames(Q)<-letters[1:2]
Q
```

```
## a b
## a -1 1
## b 1 -1
```

```
## simulate character
x<-sim.history(tree,Q)$states
```

```
## Done simulation(s).
```

```
x
```

```
## A B C D E F G H I J K L M N O P Q R
## "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "a" "a" "a" "b" "b" "b" "b"
## S T U V W X Y Z
## "b" "b" "b" "a" "a" "a" "a" "a"
```

Next, we have to add some uncertainty. Arbitrarily, let's say that we do not know the states of five of the twenty-six taxa in our tree. For the simulation, let's choose the five taxa with missing data at random:

```
## first encode our original data as a matrix
x<-to.matrix(x,seq=letters[1:2])
x
```

```
## a b
## A 0 1
## B 0 1
## C 0 1
## D 0 1
## E 0 1
## F 0 1
## G 0 1
## H 0 1
## I 0 1
## J 0 1
## K 0 1
## L 1 0
## M 1 0
## N 1 0
## O 0 1
## P 0 1
## Q 0 1
## R 0 1
## S 0 1
## T 0 1
## U 0 1
## V 1 0
## W 1 0
## X 1 0
## Y 1 0
## Z 1 0
```

```
## now set some of these taxa to be uncertain
x[sample(1:26,5),]<-rep(0.5,2)
x
```

```
## a b
## A 0.0 1.0
## B 0.0 1.0
## C 0.0 1.0
## D 0.0 1.0
## E 0.0 1.0
## F 0.5 0.5
## G 0.0 1.0
## H 0.5 0.5
## I 0.0 1.0
## J 0.0 1.0
## K 0.0 1.0
## L 1.0 0.0
## M 1.0 0.0
## N 1.0 0.0
## O 0.0 1.0
## P 0.0 1.0
## Q 0.0 1.0
## R 0.5 0.5
## S 0.0 1.0
## T 0.5 0.5
## U 0.0 1.0
## V 1.0 0.0
## W 1.0 0.0
## X 0.5 0.5
## Y 1.0 0.0
## Z 1.0 0.0
```

Finally, let's generate some stochastic character maps using the phytools
function `make.simmap`

:

```
trees<-make.simmap(tree,x,nsim=100)
```

```
## make.simmap is sampling character histories conditioned on the transition matrix
## Q =
```

```
## a b
## a -0.9951156 0.9951156
## b 0.9951156 -0.9951156
```

```
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
```

```
## a b
## 0.5 0.5
```

```
## Done.
```

```
trees
```

```
## 100 phylogenetic trees
```

If we want to obtain the posterior probabilities at nodes or tips, we
can use the function `describe.simmap`

:

```
obj<-describe.simmap(trees)
obj
```

```
## 100 trees with a mapped discrete character with states:
## a, b
##
## trees have 7.6 changes between states on average
##
## changes are of the following types:
## a,b b,a
## x->y 3.55 4.05
##
## mean total time spent in each state is:
## a b total
## raw 2.4524900 5.0739916 7.526482
## prop 0.3258481 0.6741519 1.000000
```

```
plot(obj)
```

The matrices of posterior probabilities at nodes and tips are stored in
the two matrices `obj$ace`

and `obj$tips`

,
respectively. Check it out:

```
obj$ace
```

```
## a b
## 27 0.35 0.65
## 28 0.07 0.93
## 29 0.05 0.95
## 30 0.05 0.95
## 31 0.13 0.87
## 32 0.11 0.89
## 33 0.00 1.00
## 34 0.00 1.00
## 35 0.00 1.00
## 36 0.00 1.00
## 37 0.00 1.00
## 38 0.00 1.00
## 39 0.00 1.00
## 40 0.00 1.00
## 41 0.59 0.41
## 42 1.00 0.00
## 43 0.00 1.00
## 44 0.07 0.93
## 45 0.49 0.51
## 46 0.27 0.73
## 47 0.88 0.12
## 48 0.89 0.11
## 49 0.83 0.17
## 50 0.95 0.05
## 51 1.00 0.00
```

```
obj$tips
```

```
## a b
## A 0.00 1.00
## B 0.00 1.00
## C 0.00 1.00
## D 0.00 1.00
## E 0.00 1.00
## F 0.09 0.91
## G 0.00 1.00
## H 0.01 0.99
## I 0.00 1.00
## J 0.00 1.00
## K 0.00 1.00
## L 1.00 0.00
## M 1.00 0.00
## N 1.00 0.00
## O 0.00 1.00
## P 0.00 1.00
## Q 0.00 1.00
## R 0.20 0.80
## S 0.00 1.00
## T 0.42 0.58
## U 0.00 1.00
## V 1.00 0.00
## W 1.00 0.00
## X 0.86 0.14
## Y 1.00 0.00
## Z 1.00 0.00
```

Of course, the posterior probabilities at the tips will normally be different than the prior probabilities, as we would expect.

Another possibility, of course, for visualizing the variability across
maps at nodes & tips is the phytools function `densityMap`

.

```
densityMap(trees,outline=TRUE)
```

```
## sorry - this might take a while; please be patient
```

Finally, if we didn't know that the function `describe.simmap`

existed, it would not be too difficult to get these same values using
the function `getStates`

from phytools, as follows:

```
## first nodes
X<-getStates(trees)
levs<-sort(unique(as.vector(X)))
ace<-t(apply(X,1,function(x,l) summary(factor(x,levels=l)),l=levs))/ncol(X)
ace
```

```
## a b
## 27 0.35 0.65
## 28 0.07 0.93
## 29 0.05 0.95
## 30 0.05 0.95
## 31 0.13 0.87
## 32 0.11 0.89
## 33 0.00 1.00
## 34 0.00 1.00
## 35 0.00 1.00
## 36 0.00 1.00
## 37 0.00 1.00
## 38 0.00 1.00
## 39 0.00 1.00
## 40 0.00 1.00
## 41 0.59 0.41
## 42 1.00 0.00
## 43 0.00 1.00
## 44 0.07 0.93
## 45 0.49 0.51
## 46 0.27 0.73
## 47 0.88 0.12
## 48 0.89 0.11
## 49 0.83 0.17
## 50 0.95 0.05
## 51 1.00 0.00
```

```
## now tips
X<-getStates(trees,"tips")
levs<-sort(unique(as.vector(X)))
tips<-t(apply(X,1,function(x,l) summary(factor(x,levels=l)),l=levs))/ncol(X)
tips
```

```
## a b
## A 0.00 1.00
## B 0.00 1.00
## C 0.00 1.00
## D 0.00 1.00
## E 0.00 1.00
## F 0.09 0.91
## G 0.00 1.00
## H 0.01 0.99
## I 0.00 1.00
## J 0.00 1.00
## K 0.00 1.00
## L 1.00 0.00
## M 1.00 0.00
## N 1.00 0.00
## O 0.00 1.00
## P 0.00 1.00
## Q 0.00 1.00
## R 0.20 0.80
## S 0.00 1.00
## T 0.42 0.58
## U 0.00 1.00
## V 1.00 0.00
## W 1.00 0.00
## X 0.86 0.14
## Y 1.00 0.00
## Z 1.00 0.00
```

That's it.

Hey Liam,

ReplyDeleteCompletely off-topic!

Maybe you've posted about this, but how you make such nice posts with R code blocks in Blogger? I've been trying to do this lately with rmarkdown in Rstudio but I can't quite figure it out...

-Dave

Hi Dave.

DeleteI write the posts in R markdown & then I build the .html using knit2html in the knitr package. Then I simply open the .html file & copy and paste the HTML code into the blogger window. The images are embedded, so there is no need to link .png files or anything. Incredible, right?

I have an example of this workflow in the Dryad entry for my new paper (Revell et al. In press) here. I posted the .Rmd and the .html output.

All the best, Liam

Interesting; I got weird formatting bugs doing that with Rstudio's knit to HTML, I'll try knit2html and see if it makes a difference. Thanks for the advice, Liam!

Delete-Dave