Friday, August 23, 2013

Rooted MRP supertree from rooted triplets

A couple of weeks ago, a phytools user contact me with the following problem:
"My [three taxon] input trees are all rooted. However applying on them the two MRP [Matrix Representation Parsimony supertree] methods available in your package (pratchet and optim.parsimony), the output supertree is always unrooted. Is there anyway to get a rooted tree as output tree? Do you know an MRP method that will allow that (even if it's not implemented in your package)?

Actually, the input trees in mrp.supertree, if rooted, are treated as rooted trees during matrix representation. If this wasn't the case, and input trees were instead automatically unrooted before matrix representation, we would not be able to use triplets as input - because an unrooted triplet contains no information about relationships. However, it is true that the estimated topology is the MP unrooted topology based on the matrix representations of our input trees. This leaves us with the problem of trying to root the final MRP tree, when we don't know the overall outgroup - just the "outgroup" of each triplet.

My solution is that we just attach an artificial outgroup to each of our triplets - making them quartets. Then we perform MRP inference. Then we root the MRP supertree with the artificial outgroup. Then we strip it from our supertree (or set of MP MRP supertrees).

Here's some code to do this:

## first simulate tree for demo
tree<-pbtree(n=26)
tree$edge.length<-NULL
tree$tip.label<-LETTERS[26:1]
## done

## now simulate a set of N triplets
N<-1000 ## big number for the best chance of success
trees<-replicate(N,drop.tip(tree,sample(LETTERS,23)), simplify=FALSE)
class(trees)<-"multiPhylo"
## done

## now, let's attach an outgroup to each triplet
og<-pbtree(n=2)
og$tip.label<-c("NA","outgroup")
og$edge.length<-NULL
trees<-lapply(trees,paste.tree,tr1=og)
class(trees)<-"multiPhylo"
## done

## now our analysis
supertree<-mrp.supertree(trees,method="optim.parsimony", rearrangements="SPR")
if(class(supertree)=="multiPhylo"){
  ## if there is more than one MP tree
  supertree<-lapply(supertree,root,outgroup="outgroup", resolve.root=TRUE)
  supertree<-lapply(supertree,drop.tip,"outgroup")
  supertree<-lapply(supertree,untangle,method="read.tree")
  # this is just for plotting, not the empirical case
  supertree<-lapply(supertree,minRotate,x=setNames(26:1, LETTERS))
  class(supertree)<-"multiPhylo"
  RF<-sapply(supertree,RF.dist,tree2=tree)
} else {
  ## if there is only one MP tree
  supertree<-root(supertree,outgroup="outgroup", resolve.root=TRUE)
  supertree<-drop.tip(supertree,"outgroup")
  supertree<-untangle(supertree,method="read.tree")
  # this is just for plotting, not the empirical case
  supertree<-minRotate(supertree,setNames(26:1,LETTERS))
  RF<-RF.dist(supertree,tree) ## RF distance
}
RF ## RF distances to the true tree
## compare to the true tree
par(mfrow=c(1,2))
plot(tree,no.margin=T)
if(class(supertree)=="multiPhylo"){ plot(supertree[[1]],no.margin=T)
} else plot(supertree,no.margin=T)

How does it do? Well, in this case - not too bad:

> RF ## RF distances to the true tree
[1] 18
> ## compare to the true tree
> par(mfrow=c(1,2))
> plot(tree,no.margin=T)
> plot(supertree,no.margin=T)
(These are almost the same.)

That's it.

1 comment:

  1. I should have said this requires phytools & phangorn (and dependencies).

    ReplyDelete