Thursday, April 20, 2017

Extracting the positions of character state changes from stochastic map trees in phytools

Yesterday, a friend & colleague emailed me the following question:

“Me gustaría preguntarte si se pueden extraer las fechas estimadas de las transiciones de estado en un análisis SIMMAP en phytools?”

that is:

“I'd like to ask you if one can extract the estimated dates of state transition in a stochastic mapping (SIMMAP) analysis in phytools?”

The answer is “yes” - using the function markChanges. For instance:

library(phytools)
tree
## 
## Phylogenetic tree with 26 tips and 25 internal nodes.
## 
## Tip labels:
##  A, B, C, D, E, F, ...
## 
## Rooted; includes branch lengths.
x
## A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 
## b b a a b a b b a a a a a a a a b b b b b b b a a b 
## Levels: a b
mtree<-make.simmap(tree,x)
## make.simmap is sampling character histories conditioned on the transition matrix
## 
## Q =
##            a          b
## a -0.9111912  0.9111912
## b  0.9111912 -0.9111912
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
##   a   b 
## 0.5 0.5
## Done.
cols<-setNames(c("blue","red"),c("a","b"))
plot(mtree,cols,ftype="i",lwd=3,mar=c(4.1,1.1,1.1,1.1))
axis(1)
obj<-markChanges(mtree,colors=cols,lwd=3)

plot of chunk unnamed-chunk-1

Here, I've marked the sampled changes on the tree - but I've also extracted their positions above the root in the matrix obj:

obj
##              x       y
## a->b 0.1955114  1.5000
## a->b 0.6950576  5.7500
## b->a 0.9342177  6.0000
## a->b 0.8472460  8.0000
## a->b 0.8019033 17.0000
## a->b 0.4896849 19.4375
## a->b 0.8475341 26.0000

We can see this if we drop our marked changes to the x-axis:

plot(mtree,cols,ftype="i",lwd=3,mar=c(4.1,1.1,1.1,1.1))
axis(1)
obj<-markChanges(mtree,plot=FALSE)
for(i in 1:nrow(obj)) lines(rep(obj[i,1],2),c(0,obj[i,2]),lty="dotted",
    col=make.transparent("grey",0.8))
points(obj[,1],rep(0.2,nrow(obj)),pch=21,bg="grey")
markChanges(mtree,cols,lwd=3)

plot of chunk unnamed-chunk-3

Normally, however, we should probably obtain a set of trees from the posterior distribution - rather than a single tree. For instance:

mtrees<-make.simmap(tree,x,nsim=100)
## make.simmap is sampling character histories conditioned on the transition matrix
## 
## Q =
##            a          b
## a -0.9111912  0.9111912
## b  0.9111912 -0.9111912
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
##   a   b 
## 0.5 0.5
## Done.

Just a reminder that phytools already includes a cool S3 density function for the number of changes of different types in an object of class "multiSimmap" consisting of a posterior sample of trees:

obj<-density(mtrees)
obj
## 
## Distribution of changes from stochastic mapping:
##  a->b        b->a
##  Min.   :1   Min.   :1
##  Median :5   Median :3
##  Mean   :4.81    Mean   :3.46
##  Max.   :9   Max.   :7
## 
## 95% HPD interval(a->b): [1, 6]
## 95% HPD interval(b->a): [1, 6]
plot(obj)

plot of chunk unnamed-chunk-5

We can also try to get a posterior density on the positions of these various changes between states:

dotTree(tree,x,colors=cols,ftype="i",lwd=3,mar=c(4.1,1.1,1.1,1.1))
axis(1)
obj<-sapply(mtrees,markChanges,colors=sapply(cols,make.transparent,alpha=0.1))
add.simmap.legend(colors=sapply(setNames(cols[2:1],c("a->b","b->a")),
    make.transparent,0.4),prompt=FALSE,x=0,y=26,vertical=TRUE)

plot of chunk unnamed-chunk-6

This is just a list of matrices. For instance, here are the first 10:

obj[1:10]
## [[1]]
##              x        y
## b->a 0.2636368  8.34375
## a->b 0.7965588  5.00000
## a->b 0.9748351  7.00000
## a->b 0.9750953  8.00000
## b->a 0.3070973 19.67188
## a->b 0.4710849 19.67188
## b->a 0.9217101 24.50000
## 
## [[2]]
##               x        y
## b->a 0.08406267  8.34375
## a->b 0.70315021  5.00000
## a->b 0.89604958  7.00000
## a->b 0.76897714  8.00000
## b->a 0.67039182 25.25000
## a->b 0.68162989 25.25000
## b->a 0.96638624 24.50000
## 
## [[3]]
##               x        y
## a->b 0.37282213  1.50000
## b->a 0.61444432  1.50000
## a->b 0.88630420  1.50000
## a->b 0.87318915  3.00000
## b->a 0.94439736  3.00000
## a->b 0.89377892  5.00000
## a->b 0.81464365  7.00000
## a->b 0.91710938  8.00000
## a->b 0.07535913 19.67188
## b->a 0.89555680 24.50000
## 
## [[4]]
##              x        y
## b->a 0.2229673  8.34375
## a->b 0.7156098  5.00000
## a->b 0.9524018  7.00000
## a->b 0.8485678  8.00000
## b->a 0.8271941 24.50000
## 
## [[5]]
##              x        y
## a->b 0.7918600  1.50000
## a->b 0.8387755  5.00000
## a->b 0.9637919  7.00000
## a->b 0.8172940  8.00000
## a->b 0.6557096 15.25000
## b->a 0.6655496 15.25000
## a->b 0.2856692 19.67188
## b->a 0.8764252 24.50000
## 
## [[6]]
##              x        y
## a->b 0.5567777  1.50000
## a->b 0.9484923  5.00000
## a->b 0.9339515  7.00000
## a->b 0.8220921  8.00000
## a->b 0.2525068 19.67188
## b->a 0.7149046 25.25000
## a->b 0.9215291 26.00000
## 
## [[7]]
##              x        y
## a->b 0.3328872  1.50000
## a->b 0.6776737  5.75000
## b->a 0.7405165  6.50000
## a->b 0.7941714  6.50000
## b->a 0.8458889  6.00000
## a->b 0.9726573  8.00000
## a->b 0.2351046 19.67188
## b->a 0.9340201 24.50000
## 
## [[8]]
##               x        y
## a->b 0.45241952  1.50000
## a->b 0.04991146  8.34375
## b->a 0.15720353  8.34375
## a->b 0.67075616  5.75000
## b->a 0.86765865  6.00000
## a->b 0.99840460  8.00000
## a->b 0.29504460 19.67188
## b->a 0.71119438 25.25000
## a->b 0.92642095 26.00000
## 
## [[9]]
##              x        y
## b->a 0.3217994  8.34375
## a->b 0.8831721  5.00000
## a->b 0.8332154  7.00000
## a->b 0.9316393  8.00000
## b->a 0.8609643 24.50000
## 
## [[10]]
##              x        y
## b->a 0.3951734  1.50000
## a->b 0.8865010  1.50000
## b->a 0.4090050  8.34375
## a->b 0.9239062  5.00000
## a->b 0.9447590  7.00000
## a->b 0.9709664  8.00000
## b->a 0.6098279 25.25000
## a->b 0.6222670 25.25000
## b->a 0.9539116 24.50000

It seems like we should first accumulate the x column of all these matrices into a single vector:

X<-unlist(sapply(obj,function(x) x[,1]))

Now the changes of each type:

transitions<-sort(unique(names(X)))
heights<-lapply(transitions,function(n,x) x[which(names(x)==n)],x=X)
obj<-lapply(heights,hist,plot=FALSE)
par(mfrow=c(2,1))
barplot(obj[[1]]$counts/length(mtrees),space=0,cex.names=0.75,
    names.arg=paste(obj[[1]]$breaks[1:10],obj[[1]]$breaks[2:11],sep="-"),
    main=paste("transitions",names(heights[[1]])[1]),
    ylab=paste("average number of changes",names(heights[[1]])[1]),
    xlab="height above the root")
axis(1,at=1:10-0.5,labels=rep("",10))
barplot(obj[[2]]$counts/length(mtrees),space=0,cex.names=0.75,
    names.arg=paste(obj[[2]]$breaks[1:10],obj[[2]]$breaks[2:11],sep="-"),
    main=paste("transitions",names(heights[[2]])[1]),
    ylab=paste("average number of changes",names(heights[[2]])[1]),
    xlab="height above the root")
axis(1,at=1:10-0.5,labels=rep("",10))

plot of chunk unnamed-chunk-9

Note that there is a lot more edge lengths towards the tips of the tree - which explains why we tend to see a lot more reconstructed changes in that region.

This example uses a binary character, but it could easily be extended (with some modifications, obviously) to a multistate character.

The data & tree were simulated as follows:

Q<-matrix(c(-1,1,1,-1),2,2)
rownames(Q)<-colnames(Q)<-letters[1:2]
x<-as.factor(sim.history(tree<-pbtree(n=26,tip.label=LETTERS,scale=1),Q)$states)

No comments:

Post a Comment