Phylogenetic Tools for Comparative Biology: Stochastic mapping when tip states are uncertain

Stimulated by a user inquiry, here is a quick re-hash of ancestral character estimation using the phytools function make.simmap when tip states are uncertain.

Firstly, let's simulate some data with the property of uncertainty in the tip values - that is, we do not know the states for some extant taxa in the tree. The way this is expressed is as a prior probability distribution on the states for those tips that we do not know. For instance, if our character has two states "a" & "b", and we are completely ignorant of the values of certain species in the tree, then we might say that the prior probability of being in each of states "a" or "b" was 0.5.

## load phytools
library(phytools)
## simulate stochastic pure-birth tree
tree<-pbtree(n=26,tip.label=LETTERS,scale=1)
## generate character transition matrix
Q<-matrix(c(-1,1,1,-1),2,2)
rownames(Q)<-colnames(Q)<-letters[1:2]
Q

##    a  b
## a -1  1
## b  1 -1

## simulate character
x<-sim.history(tree,Q)$states

## Done simulation(s).

##   A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P   Q   R 
## "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "b" "a" "a" "a" "b" "b" "b" "b" 
##   S   T   U   V   W   X   Y   Z 
## "b" "b" "b" "a" "a" "a" "a" "a"

Next, we have to add some uncertainty. Arbitrarily, let's say that we do not know the states of five of the twenty-six taxa in our tree. For the simulation, let's choose the five taxa with missing data at random:

## first encode our original data as a matrix
x<-to.matrix(x,seq=letters[1:2])
x

##   a b
## A 0 1
## B 0 1
## C 0 1
## D 0 1
## E 0 1
## F 0 1
## G 0 1
## H 0 1
## I 0 1
## J 0 1
## K 0 1
## L 1 0
## M 1 0
## N 1 0
## O 0 1
## P 0 1
## Q 0 1
## R 0 1
## S 0 1
## T 0 1
## U 0 1
## V 1 0
## W 1 0
## X 1 0
## Y 1 0
## Z 1 0

## now set some of these taxa to be uncertain
x[sample(1:26,5),]<-rep(0.5,2)
x

##     a   b
## A 0.0 1.0
## B 0.0 1.0
## C 0.0 1.0
## D 0.0 1.0
## E 0.0 1.0
## F 0.5 0.5
## G 0.0 1.0
## H 0.5 0.5
## I 0.0 1.0
## J 0.0 1.0
## K 0.0 1.0
## L 1.0 0.0
## M 1.0 0.0
## N 1.0 0.0
## O 0.0 1.0
## P 0.0 1.0
## Q 0.0 1.0
## R 0.5 0.5
## S 0.0 1.0
## T 0.5 0.5
## U 0.0 1.0
## V 1.0 0.0
## W 1.0 0.0
## X 0.5 0.5
## Y 1.0 0.0
## Z 1.0 0.0

Finally, let's generate some stochastic character maps using the phytools function make.simmap:

trees<-make.simmap(tree,x,nsim=100)

## make.simmap is sampling character histories conditioned on the transition matrix
## Q =

##            a          b
## a -0.9951156  0.9951156
## b  0.9951156 -0.9951156

## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =

##   a   b 
## 0.5 0.5

## Done.

trees

## 100 phylogenetic trees

If we want to obtain the posterior probabilities at nodes or tips, we can use the function describe.simmap:

obj<-describe.simmap(trees)
obj

## 100 trees with a mapped discrete character with states:
##  a, b 
## 
## trees have 7.6 changes between states on average
## 
## changes are of the following types:
##       a,b  b,a
## x->y 3.55 4.05
## 
## mean total time spent in each state is:
##              a         b    total
## raw  2.4524900 5.0739916 7.526482
## prop 0.3258481 0.6741519 1.000000

plot(obj)

plot of chunk unnamed-chunk-4

The matrices of posterior probabilities at nodes and tips are stored in the two matrices obj$ace and obj$tips, respectively. Check it out:

obj$ace

##       a    b
## 27 0.35 0.65
## 28 0.07 0.93
## 29 0.05 0.95
## 30 0.05 0.95
## 31 0.13 0.87
## 32 0.11 0.89
## 33 0.00 1.00
## 34 0.00 1.00
## 35 0.00 1.00
## 36 0.00 1.00
## 37 0.00 1.00
## 38 0.00 1.00
## 39 0.00 1.00
## 40 0.00 1.00
## 41 0.59 0.41
## 42 1.00 0.00
## 43 0.00 1.00
## 44 0.07 0.93
## 45 0.49 0.51
## 46 0.27 0.73
## 47 0.88 0.12
## 48 0.89 0.11
## 49 0.83 0.17
## 50 0.95 0.05
## 51 1.00 0.00

obj$tips

##      a    b
## A 0.00 1.00
## B 0.00 1.00
## C 0.00 1.00
## D 0.00 1.00
## E 0.00 1.00
## F 0.09 0.91
## G 0.00 1.00
## H 0.01 0.99
## I 0.00 1.00
## J 0.00 1.00
## K 0.00 1.00
## L 1.00 0.00
## M 1.00 0.00
## N 1.00 0.00
## O 0.00 1.00
## P 0.00 1.00
## Q 0.00 1.00
## R 0.20 0.80
## S 0.00 1.00
## T 0.42 0.58
## U 0.00 1.00
## V 1.00 0.00
## W 1.00 0.00
## X 0.86 0.14
## Y 1.00 0.00
## Z 1.00 0.00

Of course, the posterior probabilities at the tips will normally be different than the prior probabilities, as we would expect.

Another possibility, of course, for visualizing the variability across maps at nodes & tips is the phytools function densityMap.

densityMap(trees,outline=TRUE)

## sorry - this might take a while; please be patient

plot of chunk unnamed-chunk-6

Finally, if we didn't know that the function describe.simmap existed, it would not be too difficult to get these same values using the function getStates from phytools, as follows:

## first nodes
X<-getStates(trees)
levs<-sort(unique(as.vector(X)))
ace<-t(apply(X,1,function(x,l) summary(factor(x,levels=l)),l=levs))/ncol(X)
ace

##       a    b
## 27 0.35 0.65
## 28 0.07 0.93
## 29 0.05 0.95
## 30 0.05 0.95
## 31 0.13 0.87
## 32 0.11 0.89
## 33 0.00 1.00
## 34 0.00 1.00
## 35 0.00 1.00
## 36 0.00 1.00
## 37 0.00 1.00
## 38 0.00 1.00
## 39 0.00 1.00
## 40 0.00 1.00
## 41 0.59 0.41
## 42 1.00 0.00
## 43 0.00 1.00
## 44 0.07 0.93
## 45 0.49 0.51
## 46 0.27 0.73
## 47 0.88 0.12
## 48 0.89 0.11
## 49 0.83 0.17
## 50 0.95 0.05
## 51 1.00 0.00

## now tips
X<-getStates(trees,"tips")
levs<-sort(unique(as.vector(X)))
tips<-t(apply(X,1,function(x,l) summary(factor(x,levels=l)),l=levs))/ncol(X)
tips

##      a    b
## A 0.00 1.00
## B 0.00 1.00
## C 0.00 1.00
## D 0.00 1.00
## E 0.00 1.00
## F 0.09 0.91
## G 0.00 1.00
## H 0.01 0.99
## I 0.00 1.00
## J 0.00 1.00
## K 0.00 1.00
## L 1.00 0.00
## M 1.00 0.00
## N 1.00 0.00
## O 0.00 1.00
## P 0.00 1.00
## Q 0.00 1.00
## R 0.20 0.80
## S 0.00 1.00
## T 0.42 0.58
## U 0.00 1.00
## V 1.00 0.00
## W 1.00 0.00
## X 0.86 0.14
## Y 1.00 0.00
## Z 1.00 0.00

That's it.

3 comments:

dwbapstFebruary 27, 2015 at 1:36 PM
Hey Liam,

Completely off-topic!

Maybe you've posted about this, but how you make such nice posts with R code blocks in Blogger? I've been trying to do this lately with rmarkdown in Rstudio but I can't quite figure it out...

-Dave

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.

Phylogenetic Tools for Comparative Biology

Pages

Friday, February 27, 2015

Stochastic mapping when tip states are uncertain - revisited

3 comments: