Saturday, August 8, 2015

Getting the timing of trait changes from a stochastically mapped discrete character history on the tree

Today I received the following question about stochastic character mapped trees in phytools:

I'm working on a data set where I would like to infer the absolute date associated with a transition from one character state to another on a phylogeny (in my case its a habitat transition in a group of fishes).
I am using SIMMAP to map the binary traits onto a subset of trees from a BEAST analysis…. Basically, I would be happy with simply extracting absolute time of each character transition from each SIMMAP tree. For example, on each tree, I just want to be able to extract that transitions happened at 10.1 mya, 12.5 mya, 18.8 mya, etc…. I just cannot figure out a straight-forward way to do this for 100-1000 trees and I was wondering if there was a function in phytools that was designed for just this sort of thing.

Well, there are several functions in phytools, such as `contSimmap` and `describe.simmap` that, in addition to `densityMap`, can be used to summarize the results of stochastic mapping in phytools; however none contain the precise function that is being sought after here. That said, it should not be too hard.

Let's start by seeing how it can be done on a single history. I'll keep it simple, and simulate a tree with three states: `"a"`, `"b"`, and `"c"`:

``````library(phytools)
Q<-matrix(c(-1,1,0,1,-2,1,0,1,-1),3,3)
colnames(Q)<-rownames(Q)<-letters[1:3]
Q
``````
``````##    a  b  c
## a -1  1  0
## b  1 -2  1
## c  0  1 -1
``````
``````tree<-sim.history(pbtree(n=26,tip.label=LETTERS,
scale=1),anc="a",Q)
``````
``````## Done simulation(s).
``````

This is a simulated history - but this would obviously normally be used with a stochastically mapped discrete character history generated, for example, with the phytools function `make.simmap`.

I'm going to plot the tree with it's mapped discrete character history - but I will also use the function `markChanges` that I just posted on my blog here so that we can see the temporal position of the changes a little bit more easily:

``````plotSimmap(tree,mar=c(4.1,0.1,0.1,0.1))
``````
``````## no colors provided. using the following legend:
##        a        b        c
##  "black"    "red" "green3"
``````
``````axis(1)
## so we can see the changes more clearly:
markChanges<-function(tree,colors=NULL,cex=1,lwd=2){
states<-sort(unique(getStates(tree)))
if(is.null(colors)) colors<-setNames(palette()[1:length(states)],
states)
obj<-get("last_plot.phylo",envir=.PlotPhyloEnv)
nc<-sapply(tree\$maps,length)-1
ii<-which(nc>0)
nc<-nc[ii]
h<-vector()
for(i in 1:length(ii)){
for(j in 1:nc[i]){
ss<-names(tree\$maps[[ii[i]]])[j+1]
mm<-tree\$edge[ii[i],1]
dd<-tree\$edge[ii[i],2]
x<-rep(obj\$xx[mm]+cumsum(tree\$maps[[ii[i]]])[j],2)
y<-c(obj\$yy[dd]-0.5*mean(strheight(LETTERS)*cex),
obj\$yy[dd]+0.5*mean(strheight(LETTERS)*cex))
lines(x,y,lwd=lwd,col=colors[ss],lend=2)
h<-c(h,x[1])
}
}
invisible(h)
}
markChanges(tree,lwd=3)
``````

Now we can use some of the same tricks in the function above to get the heights of all the changes in the tree, along with their types:

``````x<-getStates(tree,"tips")
states<-sort(unique(x))
m<-length(states)
ct<-sapply(states,function(x,y) sapply(y,function(y,x)
paste(x,"->",y,sep=""),x=x),y=states)
ct
``````
``````##   a      b      c
## a "a->a" "b->a" "c->a"
## b "a->b" "b->b" "c->b"
## c "a->c" "b->c" "c->c"
``````
``````ii<-matrix(TRUE,m,m)
diag(ii)<-rep(FALSE,m)
changes<-vector(mode="list",length=m*(m-1))
names(changes)<-as.vector(ct[ii])
nc<-sapply(tree\$maps,length)-1
ind<-which(nc>0)
nc<-nc[ind]
H<-nodeHeights(tree)
maps<-tree\$maps[ind]
for(i in 1:length(maps)){
for(j in 1:nc[i]){
sc<-paste(names(maps[[i]])[j:(j+1)],collapse="->")
h<-H[ind[i],1]+cumsum(maps[[i]])[j]
changes[[sc]]<-c(changes[[sc]],as.numeric(h))
}
}
changes<-changes[!sapply(changes,is.null)]
changes<-lapply(changes,sort)
changes
``````
``````## \$`a->b`
## [1] 0.3263951 0.5449173 0.5896102 0.6901126
##
## \$`b->a`
## [1] 0.4872248
##
## \$`b->c`
## [1] 0.6666082 0.8735417
``````

We should be able to see by plotting the changes on the tree that our computed heights correspond with the mapped changes on our plotted tree:

``````plotSimmap(tree,ylim=c(0,Ntip(tree)),mar=c(4.1,0.1,0.1,0.1))
``````
``````## no colors provided. using the following legend:
##        a        b        c
##  "black"    "red" "green3"
``````
``````xx<-markChanges(tree,lwd=3)
axis(1)
obj<-sapply(xx,function(x) lines(rep(x,2),par()\$usr[3:4],
lty="dashed",col="grey"))
obj<-sapply(changes,function(x) points(x,rep(par()\$usr[3]+0.2,
length(x))))
y=0,vertical=FALSE,prompt=FALSE)
markChanges(tree,lwd=3)
``````

Of course, we can duplicate this across a sample of trees from a posterior sample; however we need to be cognizant of the fact that we are sure to end up with different numbers of changes of each type in each different tree.

That's it.