Tuesday, May 3, 2011

Update to phylosig() - important for large trees

Diego Bilski just emailed me with the following text regarding a problem with my phylosig() function for the estimation of phylogenetic signal:

"With more than about 150 taxa, the lambda calculations of lnlik and p values are not possible, but I wasn't able to identify what causes it. With fewer taxa there are no problems. Even with a simple simulation the same 23 warnings are reported:
> library(ape)
> tree<-rtree(200)
> x<-rTraitCont(tree)
> s.lambda<-phylosig(tree,x,method="lambda",test=TRUE)
There were 23 warnings (use warnings() to see them)
"


When I tried to reproduce Diego's error, I could do so easily:

>require(ape)
> source("phylosig.R")
> tree<-rtree(200)
> x<-rTraitCont(tree)
> test<-phylosig(tree,x,method="lambda")
There were 14 warnings (use warnings() to see them)
> test
$lambda
[1] 1.091754
$logL
[1] 87.0783
> warnings()
Warning messages:
1: In optimize(f = likelihood, interval = c(0, maxLambda), ... :
NA/Inf replaced by maximum positive value
2: In ...


I quickly suspected that the problem was with the following calculation in the expression for the log-likelihood:

log(det(sig2*Cl))

This is the calculation of the logarithm of the determinant of the evolutionary rate × the lambda-transformed VCV matrix for the tree. The problem is due to the fact that for moderately large trees det(sig2*vcv(tree) can very easily escape the range of numerical precision of R (i.e., it will become infinitesimal or very large) and thus will be set to 0 or Inf respectively. This will make log(det(...)) Inf or -Inf.

Fortunately, there is a second function to compute the matrix determinant in R, determinant(), and the advantage of this function is that it has an option logarithm, which if set to TRUE returns the logarithm of the determinant. This is perfect for us as we are computing the log-likelihood anyway and this will allow us to compute the log-determinant for matrices in which log(det(...)) will not evaluate.

Indeed, substituting determinant(...,logarithm+TRUE) for log(det(...)) seems to solve our problem from before:

> source("phylosig.R") # updated version
> test<-phylosig(tree,x,method="lambda")
> test
$lambda
[1] 0.9766446
$logL
[1] 156.8490


The new version of phylosig() is now available on my R-phylogenetics page (direct link here).

No comments:

Post a Comment