Saturday, August 8, 2015

Getting the timing of trait changes from a stochastically mapped discrete character history on the tree

Today I received the following question about stochastic character mapped trees in phytools:

I'm working on a data set where I would like to infer the absolute date associated with a transition from one character state to another on a phylogeny (in my case its a habitat transition in a group of fishes).
I am using SIMMAP to map the binary traits onto a subset of trees from a BEAST analysis…. Basically, I would be happy with simply extracting absolute time of each character transition from each SIMMAP tree. For example, on each tree, I just want to be able to extract that transitions happened at 10.1 mya, 12.5 mya, 18.8 mya, etc…. I just cannot figure out a straight-forward way to do this for 100-1000 trees and I was wondering if there was a function in phytools that was designed for just this sort of thing.

Well, there are several functions in phytools, such as contSimmap and describe.simmap that, in addition to densityMap, can be used to summarize the results of stochastic mapping in phytools; however none contain the precise function that is being sought after here. That said, it should not be too hard.

Let's start by seeing how it can be done on a single history. I'll keep it simple, and simulate a tree with three states: "a", "b", and "c":

library(phytools)
Q<-matrix(c(-1,1,0,1,-2,1,0,1,-1),3,3)
colnames(Q)<-rownames(Q)<-letters[1:3]
Q
##    a  b  c
## a -1  1  0
## b  1 -2  1
## c  0  1 -1
tree<-sim.history(pbtree(n=26,tip.label=LETTERS,
    scale=1),anc="a",Q)
## Done simulation(s).

This is a simulated history - but this would obviously normally be used with a stochastically mapped discrete character history generated, for example, with the phytools function make.simmap.

I'm going to plot the tree with it's mapped discrete character history - but I will also use the function markChanges that I just posted on my blog here so that we can see the temporal position of the changes a little bit more easily:

plotSimmap(tree,mar=c(4.1,0.1,0.1,0.1))
## no colors provided. using the following legend:
##        a        b        c 
##  "black"    "red" "green3"
axis(1)
## so we can see the changes more clearly:
markChanges<-function(tree,colors=NULL,cex=1,lwd=2){
    states<-sort(unique(getStates(tree)))
    if(is.null(colors)) colors<-setNames(palette()[1:length(states)],
        states)
    obj<-get("last_plot.phylo",envir=.PlotPhyloEnv)
    nc<-sapply(tree$maps,length)-1
    ii<-which(nc>0)
    nc<-nc[ii]
    h<-vector()
    for(i in 1:length(ii)){
        for(j in 1:nc[i]){
            ss<-names(tree$maps[[ii[i]]])[j+1]
            mm<-tree$edge[ii[i],1]
            dd<-tree$edge[ii[i],2]
            x<-rep(obj$xx[mm]+cumsum(tree$maps[[ii[i]]])[j],2)
            y<-c(obj$yy[dd]-0.5*mean(strheight(LETTERS)*cex),
                obj$yy[dd]+0.5*mean(strheight(LETTERS)*cex))
            lines(x,y,lwd=lwd,col=colors[ss],lend=2)
            h<-c(h,x[1])
        }
    }
    invisible(h)
}
markChanges(tree,lwd=3)

plot of chunk unnamed-chunk-2

Now we can use some of the same tricks in the function above to get the heights of all the changes in the tree, along with their types:

x<-getStates(tree,"tips")
states<-sort(unique(x))
m<-length(states)
ct<-sapply(states,function(x,y) sapply(y,function(y,x) 
    paste(x,"->",y,sep=""),x=x),y=states)
ct
##   a      b      c     
## a "a->a" "b->a" "c->a"
## b "a->b" "b->b" "c->b"
## c "a->c" "b->c" "c->c"
ii<-matrix(TRUE,m,m)
diag(ii)<-rep(FALSE,m)
changes<-vector(mode="list",length=m*(m-1))
names(changes)<-as.vector(ct[ii])
nc<-sapply(tree$maps,length)-1
ind<-which(nc>0)
nc<-nc[ind]
H<-nodeHeights(tree)
maps<-tree$maps[ind]
for(i in 1:length(maps)){
    for(j in 1:nc[i]){
        sc<-paste(names(maps[[i]])[j:(j+1)],collapse="->")
        h<-H[ind[i],1]+cumsum(maps[[i]])[j]
        changes[[sc]]<-c(changes[[sc]],as.numeric(h))
    }
}
changes<-changes[!sapply(changes,is.null)]
changes<-lapply(changes,sort)
changes
## $`a->b`
## [1] 0.3263951 0.5449173 0.5896102 0.6901126
## 
## $`b->a`
## [1] 0.4872248
## 
## $`b->c`
## [1] 0.6666082 0.8735417

We should be able to see by plotting the changes on the tree that our computed heights correspond with the mapped changes on our plotted tree:

plotSimmap(tree,ylim=c(0,Ntip(tree)),mar=c(4.1,0.1,0.1,0.1))
## no colors provided. using the following legend:
##        a        b        c 
##  "black"    "red" "green3"
xx<-markChanges(tree,lwd=3)
axis(1)
obj<-sapply(xx,function(x) lines(rep(x,2),par()$usr[3:4],
    lty="dashed",col="grey"))
obj<-sapply(changes,function(x) points(x,rep(par()$usr[3]+0.2,
    length(x))))
add.simmap.legend(colors=setNames(palette()[1:m],states),x=0.05,
    y=0,vertical=FALSE,prompt=FALSE)
markChanges(tree,lwd=3)

plot of chunk unnamed-chunk-4

Of course, we can duplicate this across a sample of trees from a posterior sample; however we need to be cognizant of the fact that we are sure to end up with different numbers of changes of each type in each different tree.

That's it.

2 comments:

  1. Any thoughts on how to do this to a multiSimmap? I'm kinda stumped.

    ReplyDelete
    Replies
    1. I just realized i might have been unclear in my question. The goal would be a single matrix like `changes` above, but of the timing of all the transitions in the multiSimmap object. Thanks!

      Delete

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.