Friday, May 13, 2011

New version of optim.phylo.ls()

I just posted a new version of optim.phylo.ls(). This function is for phylogeny inference using the least-squares method. According to this method, we find the tree and branch lengths for which the sum of squared differences between the observed (e.g., measured from DNA sequences) and implied (i.e., the patristic distances on the tree) is minimized. Direct link to code for the new version (v0.4) is here.

My updates arose out of a R-sig-phylo thread on recovering lost branch lengths from a tree when all one has is the topology and a patristic distance matrix. This can be done using the least-squares method (and, in fact, here we are guaranteed to recover the true branch lengths, as noted in Felsenstein 2004). I now make it easy to do so by introducing the option fixed=TRUE which just uses the input tree and stops the inference machinery (which relies on nni() from the {phangorn} package) from running. More on all of this later if I have time.

As an aside - widespread problems with blogger.com means that a post I created yesterday has disappeared! Luckily, I saved a copy (actually, I did this by tracking it down in Google's cache when I first noticed it vanish). I will repost this later if it doesn't reappear on its own.

4 comments:

  1. I should add that this was also motivated by an email from S├ębastien Lavergne who asked "is there a way that you can optimize branch lengths while keeping the topology constant using optim.phylo.ls()?" To that I say "now there is!"

    ReplyDelete
  2. My lost blog post (below) came back from the dead after all. Thank you blogger.com.

    ReplyDelete
  3. Hi Liam. i'm using this fucntion to infer a phylogenetic tree using a distance matrix. In this matrix I use values from 0 to 1.
    so 1) I was wondering if your funtion will work with this range of values on my distance matrix
    and 2) I dont quite fully understand the tol argument in this function. Hope they're easy questions :). Cheers from UNAM

    ReplyDelete
  4. Hi Willy.
    1) Yes, if your matrix is a distance matrix the function should work.
    2) The argument tol determines the stopping tolerance. That means the function will continue searching until the score Q cannot be decreased by anything less than tol.
    I hope this helps. The function might work better if you give it a reasonable starting tree (say, from neighbor-joining, e.g. NJ in the phangorn package).
    - Liam

    ReplyDelete