Thursday, September 26, 2013

Slightly faster multiple pairwise RF distance

I just posted a slightly faster version of multiRF that takes advantage of the fact that the first element in the list of tree bipartitions from prop.part of the ape package is the "partition" with all the taxa in the tree - which can be ignored if our two input trees have the same taxa (they should). The result is about a 20% speed-up in the 100 × 99 ÷ 2 comparisons for 100, 100-taxon trees.

Here's a demo using my i3 Asus Vivobook (obviously - it would be much faster on a speedy desktop):

> library(phytools)
> library(phangorn)
> packageVersion("phytools")
[1] ‘0.3.53’
> ## circumvents rNNI from phangorn which is buggy
> ## in the current version
> randomNNI<-function(tree,moves=1){
  for(i in 1:moves) tree<-sample(nni(tree),1)[[1]]
  tree
 }
> ## random tree
> tree<-rtree(n=100,rooted=FALSE,br=NULL)
> ## 100 trees random NNIs
> trees<-lapply(round(runif(n=100,min=1,max=100)), function(x,y) randomNNI(y,x),y=tree)
> class(trees)<-"multiPhylo"
> ## this is a hack
> trees<-read.tree(text=write.tree(trees))
> ## compute all pairwise RF
> system.time(D1<-multiRF(trees))
   user  system elapsed
  16.41    0.02   16.58
> ## now compare to new version
> detach("package:phytools",unload=TRUE)
> install.packages("phytools_0.3-60.zip",repos=NULL)
package ‘phytools’ successfully unpacked and MD5 sums checked
> library(phytools)
> packageVersion("phytools")
[1] ‘0.3.60’
> system.time(D2<-multiRF(trees))
   user  system elapsed
  13.28    0.01   13.29
> plot(D1,D2)
(Just to verify that it still works!)

BTW, I just submitted phytools 0.3-60 to CRAN - so hopeful it is accepted & percolates through the mirror repositories this week.

Monday, September 23, 2013

Getting Robinson-Foulds distances for a set of trees

The phangorn function RF.dist - for computing the Robinson-Foulds distance between two trees - is very fast. However if we want to compare a set of trees stored in an object of class "multiPhylo" there is some opportunity to recycle internal calculations & thus improve on just calling RF.dist n × (n - 1) / 2 times for n trees. Specifically, here is some code that scales to multiple trees a little bit better than n × (n - 1) / 2 RF.dist by calling the ape function prop.part (which finds all bipartitions in the tree) only once for each tree:

multiRF<-function(trees){
 if(class(trees)!="multiPhylo")
  stop("trees should be an object of class \"multiPhylo\"")
 N<-length(trees)
 RF<-matrix(0,N,N)
 if(any(sapply(unclass(trees),is.rooted))){
  cat("Some trees are rooted. Unrooting all trees.\n")
  trees<-lapply(unclass(trees),unroot)
 }
 foo<-function(pp) lapply(pp,function(x,pp)
  sort(attr(pp,"labels")[x]),pp=pp)
 xx<-lapply(unclass(trees),function(x) foo(prop.part(x)))
 for(i in 1:(N-1)) for(j in (i+1):N)
  RF[i,j]<-RF[j,i]<-2*sum(!xx[[i]]%in%xx[[j]])
 RF
}

It works OK compared to calling RF.dist a whole bunch of times independently. First, here is an alternate version of the function which uses RF.dist:

multiRF2<-function(trees){
 N<-length(trees)
 RF<-matrix(0,N,N)
 for(i in 1:(N-1)) for(j in (i+1):N)
  RF[i,j]<-RF[j,i]<-RF.dist(trees[[i]],trees[[j]])
 RF
}

Now, a very simple comparison with random trees:

> N<-100
> n<-100
> trees<-rmtree(N,n,rooted=FALSE)
> system.time(X1<-multiRF(trees))
   user  system elapsed
   9.16    0.03    9.19
> system.time(X2<-multiRF2(trees))
   user  system elapsed
  19.98    0.00   19.99
> all(X1==X2)
[1] TRUE

Friday, September 20, 2013

Even more on plotting subtrees as triangles on a phylogenetic backbone tree

This is pretty much guaranteed to be the last post on this topic today, but I made some additional refinements of plot.backbonePhylo, my new plotting method to show subtrees as triangles.

The three main changes are as follows: (1) following Luke's suggestion, subtrees containing only 1 taxon are now plotted as a line (actually - a triangle with zero width, but I guarantee you won't be able to tell the difference), instead of as a triangle with unit width; (2) following Graham Slater's suggestion, I made a small change so that there is a space between adjacent triangles - the consequence is that clades with 2 taxa are of unit length, clades with 10 taxa are 9 units in width, etc., which actually lines up with tips containing 1 taxon, which now have unit width; and (3) I added the optional scaling factor vscale, which scales the vertical dimension so that large subtrees can be portrayed while still allowing singletons and spacing between triangles.

All of these updates are available here, but are also part of a new phytools build (phytools 0.3-51).

Here's a demo:

> library(phytools)
> packageVersion("phytools")
[1] ‘0.3.51’
> library(geiger) # to help us build our tree
> ## now let's create our backbone tree with
> ## random subtree diversities
> tree<-transform(pbtree(n=10),model="lambda",lambda=0.5)
> ## for old versions of geiger, use lambdaTree
>
> ## create a translation table
> ## leaving a couple of single-taxon clades for fun
> tip.label<-sample(tree$tip.label,8)
> clade.label<-LETTERS[1:8]
> N<-ceiling(runif(n=8,min=1,max=20))
> ## set crown node depth to 1/2 the maximum depth
> depth<-sapply(tip.label,function(x,y) 0.5*y$edge.length[which(tree$edge[,2]==which(y$tip.label== x))],y=tree)
> trans<-data.frame(tip.label,clade.label,N,depth)
> rownames(trans)<-NULL
> rm(tip.label,clade.label,N,depth)
>
> ## here's what trans looks like
> trans
  tip.label clade.label  N     depth
1        t4           A  3 1.0079214
2        t8           B 13 0.8367549
3        t1           C 15 1.1433276
4        t3           D  8 1.0129504
5        t6           E 12 0.9687320
6        t5           F  2 0.9687320
7        t9           G 14 0.7253729
8       t10           H 17 0.7253729

> ## convert
> tt<-phylo.toBackbone(tree,trans)
>
> ## plot
> plot(tt)
> ## now let's make our clades much bigger
> trans$N<-trans$N*10
> trans
  tip.label clade.label   N     depth
1        t4           A  30 1.0079214
2        t8           B 130 0.8367549
3        t1           C 150 1.1433276
4        t3           D  80 1.0129504
5        t6           E 120 0.9687320
6        t5           F  20 0.9687320
7        t9           G 140 0.7253729
8       t10           H 170 0.7253729
> tt<-phylo.toBackbone(tree,trans)
> plot(tt)
> ## now let's use the new parameter vscale
> plot(tt,vscale=0.1)

Pretty cool.

We have to be careful with this, though, because if we adjust our clade widths such that any are less than 1 unit then we are going to get some really weird plots.

More on plotting backbone trees with triangles as subtrees

Yesterday (more accurately, early this morning), I posted about a new plotting method for plotting a backbone phylogeny with diversities for each subtree represented as a triangle. This type of visualization is quite common in the literature; however, I'm told (by Luke Mahler, so blame him if this is incorrect) that there is no such plotting method so far in R.

The way I did this was by first creating a new type of phylogeny object, of class "backbonePhylo". This object is stored in a very similar way to the typical phylogeny object in R (class "phylo"), which the exception that we have replaced the vector tip.label with the list tip.clade. tip.clade is a list in which each element contains a label, a diversity (i.e., the number of tips hidden in the subtree), and a crown depth.

I have written some of the methods around this object as S3 methods, which means they can be dispatched using (for instance) calls to the generic plot, reorder, print, etc.

This morning, I did a little more work on this suite of functions. Most importantly, I created a new function (phylo.toBackbone) that converts an object of class "phylo" to an object of class "backbonePhylo" with the help of a translation table. The translation table is a data frame with four variables: tip.label (containing the names of the tips in the input tree - these might be exemplar taxa); label (containing the names of the subtrees - these might be the names of higher taxa); N (obviously, the diversities of each subtree); and depth (the age of each crown group, in terms of the branch lengths of the input tree - note that this cannot be longer then the corresponding tip edge). Note that not all tips in the original tree need to be in the translation table. Those that are not will be treated as clades containing one taxon in the backbone tree.

Here's a demo using random subtree sizes on a stochastic backbone tree. New code for this is here.

> library(phytools)
> library(geiger) # to help us build our tree
> source("backbonePhylo.R")
> ## now let's create our backbone tree
> tree<-transform(pbtree(n=10),model="lambda",lambda=0.5)
> ## for old versions of geiger, use lambdaTree
>
> ## create a translation table
> ## leaving a couple of single-taxon clades for fun
> tip.label<-sample(tree$tip.label,8)
> clade.label<-LETTERS[1:8]
> N<-ceiling(runif(n=8,min=1,max=20))
> ## set crown node depth to 1/2 the maximum depth
> depth<-sapply(tip.label,function(x,y) 0.5*y$edge.length[which(tree$edge[,2]==which(y$tip.label== x))],y=tree)
> trans<-data.frame(tip.label,clade.label,N,depth)
> rownames(trans)<-NULL
> rm(tip.label,clade.label,N,depth)
>
> ## here's what trans looks like
> trans
  tip.label clade.label  N    depth
1        t2           A  6 0.8992932
2        t4           B  5 0.7617329
3        t3           C 16 0.8674696
4        t5           D  9 0.5984337
5        t1           E  5 0.8992932
6       t10           F  7 0.5554955
7        t7           G  8 0.5664347
8        t9           H  7 0.5554955
>
> ## convert
> tt<-phylo.toBackbone(tree,trans)
>
> ## plot
> plot(tt)

That's pretty good.

Plotting phylogenies with triangles for subtrees

Here's something new. Luke Mahler asked me if I could create a plotting method that could take a backbone phylogeny and plot subtrees using triangles that are scaled proportionally to the number of species in the subtree. I'm still working on this, but I posted some preliminary code here.

The function takes a modified tree as an object of class "backbonePhylo". I'm going to make some functions to convert from "phylo" objects, but for now this has to be built manually. The "backbonePhylo" object is like a "phylo" object, but with tip.clade substituted for tip.label. tip.clade is a list in which each subtree is represented by a label, a number of descendant taxa, and crown group depth.

Here's a quick demo of what can be done so far:

> library(phytools)
> library(geiger) # to help us build our tree
> source("backbonePhylo.R")
> ## now let's create our backbone tree with
> ## random subtree diversities
> tree<-transform(pbtree(n=10),model="lambda",lambda=0.5)
> tree$tip.clade<-list()
> for(i in 1:10){
  tree$tip.clade[[i]]<-
    list(label=paste("Group",i),
    N=ceiling(runif(n=1,min=0,max=20)),
    depth=tree$edge.length[which(tree$edge[,2]==i)])
  }
> class(tree)<-"backbonePhylo"
> tree

Backbone phylogenetic tree with 10 subtrees and 9 resolved internal nodes.

Labels: Group 1, Group 2, Group 3, Group 4, Group 5, ...
Diversities: 2, 18, 13, 7, 9, ...

> ## ok, let's plot it
> plot(tree)

Cool. That's more or less what we were going for.

Wednesday, September 18, 2013

Coloring tip labels by character state

Yesterday I received the query:

"I want to color my tree tip labels according to a particular character and provide a legend for the same."

Here's a demo of one way to do this, using the phytools data object anoletree:

require(phytools)
data(anoletree)
tree<-anoletree
x<-getStates(tree,"tips")
par(fg="transparent")
plotTree(tree,fsize=0.6,ylim=c(-1,length(tree$tip.label)))
lastPP<-get("last_plot.phylo",env=.PlotPhyloEnv)
ss<-sort(unique(x))
colors<-setNames(palette()[1:length(ss)],ss)
par(fg="black")
add.simmap.legend(colors=colors,vertical=FALSE,x=0.25,
  y=-1,prompt=FALSE)
colors<-sapply(x,function(x,y) y[which(names(y)==x)],
  y=colors)
tt<-gsub("_"," ",tree$tip.label)
text(lastPP$xx[1:length(tt)],lastPP$yy[1:length(tt)],
  tt,cex=0.6,col=colors,pos=4,offset=0.1,font=3)

Here's the result:

Obviously, some of this is idiosyncratic to this specific dataset. Hopefully, users can figure out how to modify this to their own data and problem.

Friday, September 13, 2013

Simulating DNA sequences with Γ rate heterogeneity among sites

I just added this to the (fairly maligned) function genSeq, for simulating DNA sequences by (inefficiently) wrapping the phytools function sim.history. Here's what the function looks like now:

genSeq<-function(tree,l=1000,Q=NULL,rate=1, format="DNAbin",...){
 if(is.null(Q)){
  Q<-matrix(1,4,4)
  rownames(Q)<-colnames(Q)<-c("a","c","g","t")
  diag(Q)<-0
  diag(Q)<--colSums(Q)
 }
 if(length(rate)!=l){
  if(length(rate)==1) rate<-rep(rate,l)
  else {
   cat("warning: length(rate) & l should match for      length(rate)>1\n")
   cat(" rate will be recycled.\n")
   rate<-rep(rate,ceiling(l/length(rate)))[1:l]
  }
 }
 cat("simulating sequences....\n")
 X<-sapply(rate,function(a,b,c)   sim.history(b,a*c)$states,b=tree,c=Q)
 if(format=="DNAbin") return(as.DNAbin(X))
 else if(format=="phyDat") return(as.phyDat(X))
 else if(format=="matrix") return(X)
}

One can easily see that the way rate heterogeneity is handled is by allowing the user to supply a site-specific vector of rates. Here's what this would look like if we wanted to simulate sequences under a continuous Γ rate heterogeneity model with a shape parameter, α = 0.25 and a mean rate of 1.0 substitution per site per unit of branch length:

> library(phytools)
> tree<-pbtree(n=26,tip.label=LETTERS,scale=0.1)
> gg<-rgamma(n=1000,shape=0.25,rate=0.25)
> X<-genSeq(tree,rate=gg)
simulating sequences....
> X
26 DNA sequences in binary format stored in a matrix.

All sequences of same length: 1000

Labels: A B C D E F ...

Base composition:
    a    c    g    t
0.264 0.245 0.242 0.249

We can do invariant sites the same way - although in this case it will break the function if we set any of the rates to zero, so we need to set them very slightly > 0:

> ii<-rep(1,1000)
> tol<-1e-12
> ii[sample(1:1000,100)]<-tol
> X<-genSeq(tree,rate=ii)
simulating sequences....
> X
26 DNA sequences in binary format stored in a matrix.

All sequences of same length: 1000

Labels: A B C D E F ...

Base composition:
    a    c    g    t
0.244 0.261 0.238 0.257

As Klaus pointed out, this method is very slow - so you probably have better options. But it's still a fun exercise to see how it works - and offers the interesting possibility that we could store and return all character histories along with the states for tips.

New version of pbtree that allows user defined tip labels

The post title pretty much says it all.

The only challenging thing here was dealing with time-stops (we don't know how many tips we'll end up with, so we can't specify tip labels a priori) and taxa-stop with extinction (we define this based on the number of extant lineages - if we choose to also return extinct tips, what do we call them?).

Code for the new version of the function is here, and it is in a new phytools build (phytools 0.3-49) that can be installed from source.

Here's a demo:

> library(phytools)
> packageVersion("phytools")
[1] ‘0.3.49’
> tree<-pbtree(n=26,tip.label=LETTERS[26:1])
> plotTree(tree)
> tree<-pbtree(b=1,d=0.3,n=20,tip.label=paste("species_", 20:1,sep=""))
Warning: only using labels in tip.label for extant tips.
extinct tips will be labeled X1, X2, etc.
> plotTree(tree)
> ## this will probably go completely extinct
> tree<-pbtree(b=1,d=0.9,n=10,tip.label=letters[1:10])
Warning: only using labels in tip.label for extant tips.
extinct tips will be labeled X1, X2, etc.
> plotTree(tree)
> ## n & t
> n<-26
> b<-1
> d<-0.4
> t<-log(26/2)/(b-d)
> tree<-pbtree(n=n,t=t,b=b,d=d,tip.label=LETTERS[26:1], method="direct")
Warning: only using labels in tip.label for extant tips.
extinct tips will be labeled X1, X2, etc.
simulating with both taxa-stop (n) & time-stop (t) using
'direct' sampling. this is experimental
> plotTree(tree)

Thursday, September 12, 2013

Simple DNA sequence simulator using sim.history internally

I was recently playing with simSeq in the phangorn package - but I couldn't get it to do exactly what I wanted (probably for lack of sufficient patience). Then I realized that I could (nearly) just as easily simulate DNA sequences using phytools with the function sim.history. Here's a quick & incredibly simple function that I wrote to do this that wraps around sim.history:

genSeq<-function(tree,l=1000,Q=NULL,rate=1, format="DNAbin",...){
  if(is.null(Q)){
    Q<-matrix(1,4,4)
    rownames(Q)<-colnames(Q)<-c("a","c","g","t")
    diag(Q)<-0
    diag(Q)<--colSums(Q)
  }
  X<-replicate(l,sim.history(tree,rate*Q)$states)
  if(format=="DNAbin") return(as.DNAbin(X))
  else if(format=="phyDat") return(as.phyDat(X))
  else if(format=="matrix") return(X)
}

Yes, it's really that simple. Now, admittedly this function cannot simulate rate heterogeneity among sites, unequal base frequencies, or invariant sites, but these would be relatively straightforward to add.

Here's, let's try it out:

> ## first let's generate a random tree
> ## with a basal root taxon "Z"
> tree<-pbtree(n=25,scale=0.9)
> tree$tip.label<-LETTERS[25:1]
> tree$root.edge<-0
> root<-list(edge=matrix(c(3,1,3,2),2,2,byrow=TRUE),
edge.length=c(1,0.1),
tip.label=c("Z","NA"),
Nnode=1)
> class(root)<-"phylo"
> tree<-paste.tree(root,tree)
>
> ## now let's simulate under Juke-Cantor
> ## (the default)
> X<-genSeq(tree,l=2000,rate=0.1,format="phyDat")
> X
26 sequences with 2000 character and 1711 different site patterns.
The states are a c g t
>
> ## now let's use our data for inference
> library(phangorn)
> obj<-pml(rtree(n=26,tip.label=LETTERS),X)
> fit<-optim.pml(obj,optNni=TRUE)
optimize edge weights: -56156.91 --> -40367.43
optimize edge weights: -40367.43 --> -40367.43
optimize topology: -40367.43 --> -38967.95
...
optimize topology: -25858.87 --> -25858.87
0
Warning message:
I unrooted the tree (rooted trees are not yet supported)
>
> ## measure RF distance to original
> RF.dist(unroot(tree),unroot(fit$tree))
[1] 0
> ## plot original and estimated tree
> par(mfcol=c(2,1))
> plotTree(tree,mar=c(0.1,1.1,3.1,0.1))
> title("a) Generating tree",adj=0,cex.main=1.2)
> plotTree(midpoint(fit$tree),mar=c(0.1,1.1,3.1,0.1))
> title("b) Estimated tree using ML",adj=0,cex.main=1.2)

That works OK. Adding other attributes typical of molecular sequence models is really easy, so I'll probably do that. Pretty cool.

Collapse n random edges in a tree

An R-sig-phylo subscriber asks:

"Does anyone know of an existing function that will take a resolved tree and a number (n) as input, and randomly choose n branches to collapse to zero length?"

Here is my response, for the record:

If you don't care about the keeping the total height of the tips above the root constant, you could just do:

ff<-function(tree,n){
  xx<-sample(2:tree$Nnode,n)+length(tree$tip.label)
  yy<-sapply(xx,function(x,y) which(x==y),y=tree$edge[,2])
  tree$edge.length[yy]<-0
  tree<-di2multi(tree)
  tree
}
You can take out the di2multi if you want your tree to be bifurcating with internal edges of zero length.

Let's try it out:

> tree<-rtree(n=40)
> tt<-ff(tree,10)
> tt
Phylogenetic tree with 40 tips and 29 internal nodes.
Tip labels:
    t12, t38, t18, t9, t7, t35, ...
Unrooted; includes branch lengths.
> par(mfcol=c(2,1))
> plotTree(tree,fsize=0.8)
> plotTree(tt,fsize=0.8)

If you want to keep the total height of the tips constant, that's going to be a little more complicated. Maybe do this, which gives all the lost branch lengths to the daughter edges (i.e., collapses down):

gg<-function(tree,n){
  for(i in 1:n){
    ii<-sample(2:tree$Nnode,1)+length(tree$tip.label)
    ll<-tree$edge.length[which(tree$edge[,2]==ii)]
    tree$edge.length[which(tree$edge[,1]==ii)]<-
      tree$edge.length[which(tree$edge[,1]==ii)]+ll
    tree$edge.length[which(tree$edge[,2]==ii)]<-0
    tree<-di2multi(tree)
  }
  tree
}

Let's try this one:

> tree<-pbtree(n=40)
> tt<-gg(tree,10)
> tt
Phylogenetic tree with 40 tips and 29 internal nodes.
Tip labels:
    t22, t25, t35, t36, t5, t4, ...
Unrooted; includes branch lengths.
> par(mfcol=c(2,1))
> plotTree(tree,fsize=0.7)
> plotTree(tt,fsize=0.7)

That's it.

Monday, September 9, 2013

How to simulate using pbtree & retain only those trees that don't go extinct....

A phytools user just contacted me with the following comment about the function for birth-death tree simulation, pbtree:

"Would it be helpful to add an option such that pbtree keeps trying until a tree survives, rather than returning NULL? (I'm trying to simulate a set number of trees of a set number of taxa, a proportion of which go extinct before the end of the simulation)."

Although it would be straightforward to pop this into pbtree, it is also pretty easy to write a simple wrapper script around pbtree to the same effect. For example:

foo<-function(b,d,n,t){
  while(is.null(x<-pbtree(b,d,n,t,extant.only=TRUE,
    quiet=TRUE))) NULL
  x
}

Let's try it out:

> trees<-replicate(100,foo(b=1,d=0.5,n=50,t=NULL), simplify=FALSE)
> class(trees)<-"multiPhylo"
> trees
100 phylogenetic trees
> sapply(trees,function(x) length(x$tip.label))
  [1] 50 50 50 ...

In my opinion, an important caveat should be attached to this endeavor - and that is that by sampling birth-death histories conditioned on a particular outcome (survival to the present) we are only sampling from a portion of the distribution under our simulation conditions.

The effect of this is easiest to show using time-stop (rather than taxon-stop) simulations. Let's simulate with pbtree and then with our wrapper script. We'll use b, d, and t to yield an expected number of extant taxa at time t of 50.

> b<-1
> d<-0.5
> t<-log(25)/(b-d)
> t1<-pbtree(b,d,t=t,nsim=1000,extant.only=TRUE,quiet=TRUE)
> l1<-sapply(unclass(t1),function(x) length(x$tip.label))
> t2<-replicate(1000,foo(b,d,n=NULL,t),simplify=FALSE)
> class(t2)<-"multiPhylo"
> l2<-sapply(unclass(t2),function(x) length(x$tip.label))
> mean(l1)
[1] 50.661
> mean(l2)
[1] 67.524
> par(mfcol=c(2,1))
> par(mar=c(5.1,4.1,2.1,1.1))
> bb<-seq(0,max(c(l1,l2))+25,25)
> hist(l1,xlab="number of extant taxa",main=NULL,breaks=bb)
> title(main="a) no rejection",adj=0,cex.main=1)
> hist(l2,xlab="number of extant taxa",main=NULL,breaks=bb)
> title(main="b) reject extinct trees",adj=0,cex.main=1)

So, you can see that when we reject all the simulations that went extinct before the present, our average number of extant taxa is much larger than the expected value of 50. This may be irrelevant, depending on the purpose of our simulation - but it is something that can be kept in mind.

Plotting facing trees using phytools

A recent R-SIG-phylo user asked how to plot "mirror" or facing trees. This is pretty straightforward, in general. For instance, we can use cophylo.plot in the ape package to draw two facing trees in the same plot. However, there are other options as well - especially using par(mfrow) and layout. Here's a quick demo of how we can use these functions to plot facing phylogenies in their simplest forms:

> ## load phytools
> require(phytools)
Loading required package: phytools
> packageVersion("phytools")
[1] ‘0.3.47’
> ## simulate a tree & make realistic looking tip labels
> tree<-pbtree(n=26)
> tree$tip.label<-sapply(LETTERS[26:1],function(x) paste(x,".",paste(sample(letters,round(runif(n=1,min=4,v max=10))),collapse=""),sep=""))
> ## here is that tree on its own
> plotTree(tree)
> ## now let's try facing trees in which we repeat the
> ## tip labels
> par(mfrow=c(1,2))
> plotTree(tree)
> plotTree(tree,direction="leftwards")

If our tip labels are the same & in the same order in the two trees, we can plot the labels only once using the function layout to split our plotting area into three parts: two for the trees with no labels, and a third (the middle segment) for our centered tip labels.

> ## now let's do it writing the tip labels only once
> ## we'll have to adjust widths based on the size of our > ## labels
> layout(matrix(1:3,1,3),widths=c(0.42,0.16,0.42))
> plotTree(tree,ftype="off")
> plot.new()
> plot.window(xlim=c(-0.1,0.1),ylim=c(1, length(tree$tip.label)))
> par(cex=1)
> text(rep(0,length(tree$tip.label)), 1:length(tree$tip.label),tree$tip.label)
> plotTree(tree,ftype="off",direction="leftwards")

The neatest thing, of course, is if we use the same approach to show the result of some of phytools custom visualizations. For instance, let's show a "contMap" plot of two different correlated characters in each of the subplots, as follows. (Note that this uses a new version of contMap from the latest non-CRAN phytools release, phytools 0.3-47.)

> ## simulate correlated trait data
> V<-matrix(c(1,0.8,0.8,1),2,2)
> X<-sim.corrs(tree,V)
> layout(matrix(1:3,1,3),widths=c(0.44,0.12,0.44))
> par(cex=1)
> contMap(tree,X[,1],ftype="off",sig=1,legend=1)
> ylim<-c(1-0.12*(length(tree$tip.label)-1),length(tree$tip.label))
> plot.new(); plot.window(xlim=c(-0.1,0.1),ylim=ylim)
> text(rep(0,length(tree$tip.label)), 1:length(tree$tip.label),tree$tip.label)
> contMap(tree,X[,2],ftype="off",direction="leftwards", sig=1,legend=1)
(Click for larger version.)

Cool.

Convert a tree with a mapped character to a tree with singleton nodes

The way that phytools stores a tree containing a mapped discrete character (for instance, created using the functions make.simmap or sim.history) is as a standard object of class "phylo", with an additional element ($maps) which is a list of the times spent in each state (in temporal sequence from the start to the end of the edge) along each edge.

Another way to store the same information would be as a modified "phylo" object in which each edge has one state, but so-called "singleton nodes" (nodes with only one descendant edge) are permitted. Converting from phytools standard format for mapped trees to this type of object was requested of me recently by Luke Mahler. The main reason for this was to be able to convert to an object of class "ouchtree". OUCH is a package that can be used to fit the multi-optimum Ornstein-Uhlenbeck model of Butler & King (2004). OUCH does not permit multiple regimes along a single branch in the tree (although this should be allowed theoretically by the model); however the OUCH function ape2ouch (which converts between an object of class "phylo" and an "ouchtree" object) accepts singleton nodes.

I just posted this function (map.to.singleton). Just so we can see that it's doing exactly what we thing it's doing, I have also added to phytools a new function plotTree.singletons which can do basic tree plotting of trees containing singleton nodes. Here's a quick demo of how the whole thing works:

> ## load phytools
> require(phytools)
> packageVersion("phytools")
[1] ‘0.3.46’
> ## first, let's get our tree with a mapped character
> tree<-pbtree(n=40,scale=1)
> Q<-matrix(c(-2,1,1,1,-2,1,1,1,-2),3,3)
> colnames(Q)<-rownames(Q)<-letters[1:3]
> tree<-sim.history(tree,Q,anc="a")
> tree

Phylogenetic tree with 40 tips and 39 internal nodes.

Tip labels:
        t22, t35, t36, t27, t28, t12, ...

Rooted; includes branch lengths.
> ## plot it
> plotSimmap(tree,lwd=2,pts=TRUE)
no colors provided. using the following legend:
      a        b        c
"black"    "red" "green3"
> ## ok, now let's convert to a tree with singletons & plot
> singles<-map.to.singleton(tree)
> singles

Phylogenetic tree with 40 tips and 72 internal nodes.

Tip labels:
        t22, t35, t36, t27, t28, t12, ...

Rooted; includes branch lengths.
> plotTree.singletons(singles)

That's pretty much it.

Wednesday, September 4, 2013

Threshold character evolution visualization along the branches of a tree

I'll be presenting on ancestral character estimation under the threshold model at the upcoming symposium entitled "Mathematics for an evolving biodiversity" in Montreal, Quebec in a couple of weeks. Last night, I thought it might be neat to take my existing function, bmPlot (which does discrete-time simulation & visualization under the threshold model), and modify it so that we could also visualize evolution of a threshold character up the branches of the tree.

The way I did this was by tracking the evolution of liability, translating liability into the threshold trait, and then aggregating adjacent generations with the same state into a stochastic-map style tree. Code for the new version of bmPlot is here.

One of the predictions of the threshold model is that the character should change back & forth across the threshold many times very rapidly when near the threshold. (Actually, infinitely many times in theory, under continuous time; remember, ours will be a discrete-time simulation.) That means that (compared to a 'nucleotide-model' type simulation) there some periods when the character changes very rapidly many times; and other long periods when the character doesn't change at all. Let's see how that bears out in simulation.

> tree<-pbtree(b=b,n=50,t=1000,type="discrete")
simulating with both taxa-stop (n) and time-stop (t) is
performed via rejection sampling & may be slow

51 trees rejected before finding a tree

> par(mfrow=c(2,1))
> par(mar=c(4.1,4.1,1.1,1.1))
> X<-bmPlot(tree,type="threshold",thresholds=c(0,0.5,2), anc=1,sig2=1/1000,ngen=max(nodeHeights(tree)), return.tree=TRUE)
> plotSimmap(X$tree,lwd=3,ftype="off",colors=X$colors)

So our prediction appears to be exactly borne out - large parts of the tree, including very long branches, have few to no changes; however in areas when a lineage is close to one of the thresholds, it tends to change back & forth rapidly. Cool.

Tuesday, September 3, 2013

New version of plotTree that doesn't need branch lengths

I just posted a new version of phytools (phytools 0.3-43) that can plot a phylogenetic tree even if branch lengths are not provided by the user. This was really trivial & I meant to do it ages ago. Basically, it just calls ape's compute.brlen internally, and then plots the tree as normal. E.g.,

> require(phytools)
Loading required package: phytools
> packageVersion("phytools")
[1] ‘0.3.43’
> tree<-pbtree(n=26)
> tree$tip.label<-LETTERS[1:26]
> tree$edge.length<-NULL # strip branch lengths
> plotTree(tree)

It also now (like plotSimmap) sets the environmental variable "last_plot.phylo" in the environment .PlotEnvPhylo by default - so we can do this:

> nodelabels()
which can be handy.