Wednesday, February 27, 2013

New function to write trees with ancestral states and CIs

A phytools user today requested the ability to output ancestral state estimates (and confidence intervals) to file within a Newick string. Specifically, he had the following suggestion:

It would be great if we could get the ancestral state values as well as the 95CI written in the same tree. So perhaps it is possible to append multiple values to the node like beast does with the 95HPD for divergence time estimates. I looked at the format and it puts multiple values after a node like this [&CI={lower 95%, upper 95%}, ancstate={value}].

Theoretically, this should be straightforward using the optional "phylo" attribute node.label. We could just do, for example:
> tree<-pbtree(n=10)
> x<-fastBM(tree)
> XX<-fastAnc(tree,x,CI=T)
> XX<-lapply(XX,round,2)
> tree$node.label<-paste("[&CI={",XX$CI95[,1],",", XX$CI95[,2],"},ancstate={",XX$ace,"}]",sep="")
> write.tree(tree,digits=2)
[1] "(t1:1.5,(((((t9:0.15,t10:0.15)[&CI={1.061.75}ancstate={1.4}]:0.15,t6:0.3)...

Oops. Unfortunately, this didn't work because write.tree has dropped all of our commas from node.label, even when they're within [...] square parentheses! This is presumably by design to prevent us from using Newick only characters in node labels, although they will be ignored by convention by most applications if given within square brackets.

Luckily, I already have code for tree writing in the form of the phytools function write.simmap, and I thought (rightfully) that it wouldn't be too difficult to modify the code of this function to be able to include ancestral state estimates and CIs in the requested manner.

Well, I ended up building a much more versatile function than I originally intended. writeAncestors (code here) can write trees with ancestors and CIs in "phylip" (i.e., simple Newick) or "nexus" format; it can accept as input the results from ace or fastAnc, and it detects automatically whether or not CIs should be included. In the event that a data vector is provided (that is, tip data instead of the inferred states at nodes) writeAncestors will estimate ancestral states and (optionally) CIs assuming Brownian evolution using fastAnc. In the event that multiple trees, sets of ancestral states, or data vectors are provided, writeAncestors will try to act appropriately.

Here's a quick demo of its simplest usage (some output omitted):
> ls()
[1] "tree" "x"    "XX"  
> source("writeAncestors.R")
> args(writeAncestors)
function (tree, Anc = NULL, file = "", digits = 6,
   format = c("phylip", "nexus"), ...)
NULL
> writeAncestors(tree,XX,digits=2)
(t1:1.47,(((((t9:0.15,t10:0.15)[&CI={1.06,1.75},ancstate={1.4}]:0.15,t6:0.3)...
> # now with a data vector as input
> writeAncestors(tree,x=x,digits=3)
(t1:1.468,(((((t9:0.153,t10:0.153)[&CI={1.057,1.752},ancstate={1.405}]:0.148,t6:0.301)...
> # now to Nexus file (we'll output to screen)
> writeAncestors(tree,x=x,format="nexus")
#NEXUS
[R-package PHYTOOLS, Wed Feb 27 15:47:46 2013]

BEGIN TAXA;
       DIMENSIONS NTAX = 10;
       TAXLABELS
               t1
               t10
               ...
       ;
END;
BEGIN TREES;
       TRANSLATE
               1       t1,
               2       t10,
   ...
       ;
       TREE * UNTITLED = [&R] (1:1.467541,(((((10:0.153423,2:0.153423)[&CI={1.05681,1.75229},ancstate={1.40455}]...
END;

Well, that's it.

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.