## Wednesday, November 27, 2013

### Plotting the number of steps under parsimony to attach a new tip to all possible places in a tree

A few days ago, I received an email with the following text:

"What I have been trying to do is to color the branches of a tree according to the number of steps under parsimony that are required if certain species is attached to the corresponding branch of a backbone tree. As I am paleontologist, I usually need to place certain morphological-defined species onto a backbone constraint tree. But I could not find yet the function to do that. Any help is appreciated!"

Like many such emails (unfortunately) I flagged it to (hopefully) return to later, and then I rediscovered it today. Upon closer look, my first reaction was "why do people think I know how to do these things?" As I started to respond as such, it struck me that in fact this shouldn't be that hard (thanks, in large part, to Klaus Schliep's great package phangorn).

First, let's simulate the empirical case in which the 'backbone' tree for our N taxa is 'known', but we have data for N+1 taxa and we want to know where that extra tip belongs:

require(phytools)
require(phangorn)

## simulate tree
tt<-pbtree(n=26,tip.label=LETTERS[26:1])
## this is our true, full, unrooted phylogeny
plot(unroot(tt),use.edge.length=FALSE,type="unrooted", edge.width=2)
## simulate data on the true tree
X<-genSeq(tt,l=2000,format="phyDat",rate=0.01)
## now let's drop one of our tips, tip "Z" in this case
tree<-drop.tip(tt,LETTERS[26])
tree<-unroot(tree) # unroot

OK, now we have a data set (X) containing 26 taxa; but an unrooted tree (tree) containing only 25 tips, and we want to know (& plot) the change in parsimony score that results from attaching the one remaining tip to all of the 48 places in the tree to which it could be attached!

## first set all branch lengths to 1.0
## this is just for bind.tip
tree\$edge.length<-rep(1,nrow(tree\$edge))
## vector for parsimony scores
ps<-vector(length=nrow(tree\$edge))
names(ps)<-apply(tree\$edge,1,paste,collapse=",")
## tip to attach
tip.label<-LETTERS[26]
## loop over all edges in the tree
for(i in 1:nrow(tree\$edge)){
## attach tip
x<-bind.tip(tree,tip.label,1,where=tree\$edge[i,2],
position=0.5)
x\$edge.length<-NULL
## compute parsimony score
ps[i]<-parsimony(x,X)
}
## subtract parsimony score on 25-taxon tree
ps<-ps-parsimony(tree,X)

Let's see what ps (the parsimony cost of attaching the tip to all edges of the tree) looks like in this one case:

> ps
26,1 26,27  27,2  27,3 26,28  28,4 28,29 29,30  30,5
121   122   172   172    97   124   121   146   178
30,31 31,32  32,6  32,7  31,8 29,33 33,34 34,35 35,36
178   211   211   212   211   147   149   155   168
36,9 36,10 35,37 37,38 38,11 38,12 37,13 34,39 39,40
216   216   169   212   224   222   210   153   183
40,41 41,14 41,15 40,16 39,42 42,43 43,44 44,17 44,18
202   216   216   203   183   210   210   217   217
43,19 42,20 33,45 45,46 46,47 47,21 47,22 46,23 45,48
209   209   148   200   219   235   235   219   200
48,24 48,25
216   218

names(ps) here gives the starting & ending node numbers for each edge in the tree & ps its corresponding parsimony cost.

The neatest thing is that we can easily plot these on the tree using the phytools function plotBranchbyTrait (which, unlike other plotting functions in phytools, calls the ape function plot.phylo internally):

plotBranchbyTrait(tree,ps,type="unrooted",prompt=TRUE, title="parsimony cost")

It's pretty clear that the cost of attaching tip "Z" is indeed lowest along the edge (the internode connecting the terminal edges leading to tips "V" and "Y") where tip "Z" actually belongs. Cool.

Note that if we had a model for the evolution of our characters (in this case, our data are DNA sequences - but the query pertained to morphological data from fossils so this is not guaranteed) we could easily construct a similar plot showing differences in the likelihood across the tree.

That's it.