Wednesday, May 10, 2023

Preserving node labels in a plotted cophylo tanglegram

A phytools user has recently observed that plots generated via phytools::cophylo do not necessarily preserve the node indexing of the input tree.

I have posted about this earlier, although I have to apologize that the solution is a tiny bit messy.

Here’s a more complete worked example.

To start let’s load phytools and phangorn, then use the phangorn dataset called "Laurasiatherian", and two different inference methods, to generate two different, potentially topologically incongruent trees.

## load packages
library(phytools)
## Loading required package: ape
## Loading required package: maps
library(phangorn)
## load dataset
data(Laurasiatherian)

To start, we can obtain an MP tree using phangorn::pratchet.

## estimate tree using Maximum Parsimony
mp_tree<-pratchet(Laurasiatherian,trace=0)
mp_tree
## 
## Phylogenetic tree with 47 tips and 45 internal nodes.
## 
## Tip labels:
##   Platypus, Wallaroo, Possum, Bandicoot, Opposum, Rabbit, ...
## Node labels:
##   1, 1, 0.89, 0.38, 0.39, 0.61, ...
## 
## Unrooted; no branch lengths.

Let’s go ahead & root our inferred tree using monotremes ("Platypus" in our phylogeny) as the outgroup.

rooted.mp_tree<-root(mp_tree,"Platypus",resolve.root=TRUE)
rooted.mp_tree
## 
## Phylogenetic tree with 47 tips and 46 internal nodes.
## 
## Tip labels:
##   Platypus, Wallaroo, Possum, Bandicoot, Opposum, Rabbit, ...
## Node labels:
##   Root, 1, 1, 0.89, 0.38, 0.39, ...
## 
## Rooted; no branch lengths.

The pratchet method uses bootstrap sampling implicitly, and the bootstrap proportions are stored as node labels in our "phylo" object, so let’s plot our tree with node labels giving the bootstrap proportions of each clade as follows.

plotTree(rooted.mp_tree,nodes="centered",lwd=1,type="cladogram",
  fsize=0.7)
nodelabels(rooted.mp_tree$node.label[2:rooted.mp_tree$Nnode],
  2:rooted.mp_tree$Nnode+Ntip(rooted.mp_tree),frame="none",
  cex=0.5,adj=c(1,-0.4),srt=-30)

plot of chunk unnamed-chunk-5

OK, it’s not the prettiest graph I’ve ever seen, but you get the idea.

Now let’s do the same using ML.

ml_tree<-pml_bb(Laurasiatherian,model=" GTR+G(4)",
  control=pml.control(trace=0))

We’ll root our tree as before.

rooted.ml_tree<-root(ml_tree$tree,"Platypus",resolve.root=TRUE)

Now graph our bootstrap node labels on the tree.

plotTree(rooted.ml_tree,nodes="centered",lwd=1,
  fsize=0.7)
nodelabels(rooted.ml_tree$node.label[2:rooted.ml_tree$Nnode],
  2:rooted.ml_tree$Nnode+Ntip(rooted.ml_tree),frame="none",
  cex=0.6,adj=c(1,1.2))

plot of chunk unnamed-chunk-8

These values, along with their clade associations, will be our point of reference in the tanglegram plot.

OK. We’re ready to compute our "cophylo" object!

obj<-cophylo(rooted.mp_tree,rooted.ml_tree)
## Rotating nodes to optimize matching...
## Done.

First, let’s graph it without node or edge labels.

(The option that I’m showing here, type=c("cladogram","phylogram"), will only work if you update phytools from GitHub.)

plot(obj,type=c("cladogram","phylogram"),fsize=0.6,part=0.48)

plot of chunk unnamed-chunk-10

Now let’s add in the node labels of our original "phylo" objects as in my previously given solution.

plot(obj,type="phylogram",fsize=0.6,part=0.48,
  link.type="curved")
nodelabels.cophylo(obj$trees[[1]]$node.label[2:Nnode(obj$trees[[1]])],
  2:Nnode(obj$trees[[1]])+Ntip(obj$trees[[1]]),frame="none",
  cex=0.5,adj=c(1,-0.4),which="left")
nodelabels.cophylo(obj$trees[[2]]$node.label[2:Nnode(obj$trees[[2]])],
  2:Nnode(obj$trees[[2]])+Ntip(obj$trees[[2]]),frame="none",
  cex=0.5,adj=c(0,1.4),which="right")

plot of chunk unnamed-chunk-11

Look closely at the values shown here compared to our earlier plots. Hopefully they match! (They’re meant to.)

Lastly, bootstrap proportions are usually represented as % values (i.e., between 0 and 100 instead of 0 and 1), and along edges, rather than at nodes.

Fortunately, we can do this too. Here’s one solution.

plot(obj,type="phylogram",fsize=0.6,part=0.48,
  link.type="curved",link.lty="solid",link.lwd=3,
  link.col=make.transparent("blue",0.25),pts=FALSE)
bs<-sapply(obj$trees[[1]]$edge[,2]-Ntip(obj$trees[[1]]),
  function(x,y) if(x>=2) as.numeric(y[x])*100 else "",
  y=obj$trees[[1]]$node.label)
edgelabels.cophylo(bs,frame="none",cex=0.5,adj=c(0.4,-0.2),
  which="left")
bs<-sapply(obj$trees[[2]]$edge[,2]-Ntip(obj$trees[[2]]),
  function(x,y) if(x>=2) as.numeric(y[x])*100 else "",
  y=obj$trees[[2]]$node.label)
edgelabels.cophylo(bs,frame="none",cex=0.5,adj=c(0.4,-0.2),
  which="right")

plot of chunk unnamed-chunk-12

Right?

OK. I’ll think about how to make this possible to do with a bit less scripting!

That’s about all I can handle for now, though!

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.