One difficulty that I encountered when trying to evaluate the performance of anc.ML() was the discovery that the likelihoods reported by both functions are different. In particular, I realized that the likelihoods reported by ace() exclude constant terms. For a Brownian model, the constant terms for a set of ancestral states and rate for the BM process are two: –(

*n*+

*m*-1)log(2π)/2 (for

*n*species and

*m*internal nodes), and log(|

**C**|)/2 (for cophenetic matrix

**C**). Sure enough, this is exactly the amount by which the log-likelihoods returned by ace(...,method="ML") and anc.ML() differ.

First, an example when ace() is working:

> require(phytools) # needs phytools 0.1-2

> source("anc.ML.R") # load the source code for anc.ML

> set.seed(1)

> tree<-rtree(100)

> x<-fastBM(tree)

> res1<-ace(x,tree,method="ML")

> res1$loglik

[1] -29.04148

> res2<-anc.ML(tree,x,maxit=4000)

> res2$logLik

[1] -116.1555

> plot(res1$ace,res2$ace)

> res1$loglik-(length(tree$tip)+tree$Nnode-1)*log(2*pi)/2-as.numeric(determinant(vcvPhylo(tree),logarithm=T)$modulus)/2

[1] -116.1555

This just shows that ace(...,method="ML") and anc.ML() produce the same estimates (to a fairly high degree of precision); and that the likelihoods differ by a constant for a given tree. That is, of course, when ace() behaves. Sometimes it doesn't. Try this example:

> set.seed(90)

> tree<-rtree(100)

> x<-fastBM(tree)

> res1<-ace(x,tree,method="ML")

> res1$loglik-(length(tree$tip)+tree$Nnode-1)*log(2*pi)/2-as.numeric(determinant(vcvPhylo(tree),logarith=T)$modulus)/2

[1] -92.44219

> res2<-anc.ML(tree,x,maxit=4000)

> res2$logLik

[1] -84.40321

> plot(res1$ace,res2$ace)

Obviously, something is amiss in this example. I was working on this to show how to do ancestral state estimation for a continuous trait when rates of evolution differ in different parts of the tree, and I will try to iron out any bugs in anc.ML as I work on this.

So, my friend Annat Haber and I were once having a discussion about how zero-length branches affect various analyses, as these are common byproducts of the various methods for timescaling paleontological trees. ace() complains that trees with zero-length branches have singular matrices, which really isn't true if you look at the eigenvalues of the vcv matrix (they are all positive).

ReplyDeleteFor example:

set.seed(10)

tree<-rtree(10)

tree1<-tree;tree2<-tree

tree1$edge.length[6]<-0

tree2$edge.length[11]<-0

layout(matrix(1:3,,3));plot(tree);plot(tree1);plot(tree2) #I've collapsed a terminal edge in 1, an internal edge in 2

trait<-rnorm(10);names(trait)<-tree$tip.label

x<-ace(trait,tree) #no error

x1<-ace(trait,tree1) #error!

x2<-ace(trait,tree2) #error!

I never got around to dissecting the code to find out why, however. So I tried your code.

y<-anc.ML(tree,trait)

y1<-anc.ML(tree1,trait)

y2<-anc.ML(tree2,trait)

Still complains the vcv matrix is singular. Maybe this is a problem with solve()?

-Dave

Hi Dave.

ReplyDeleteSingular matrices have a determinant of 0 and are non-invertible. In your example, vcv.phylo(tree1) is nonsingular, as you point out. However, ancestral character estimation uses the VCV matrix for all tips & internal nodes. (The phytools function that computes this is vcvPhylo.) In this case, the matrix will be singular if any terminal or internal edge lengths are zero.

However, the matrix will not be singular in the latter case if we first collapse internal edges of zero length using multi2di(). (Unfortunately, anc.ML calls pic internally to get a starting estimate of σ so the present version will still fail, but for a different reason, if zero length internal edges are collapsed. I have fixed this and confirmed that anc.ML(multi2di(tree2),x) will work. I will post this updated version this afternoon.)Thanks for the comments. Liam