Tuesday, May 3, 2011

Update to phylosig() - important for large trees

Diego Bilski just emailed me with the following text regarding a problem with my phylosig() function for the estimation of phylogenetic signal:

"With more than about 150 taxa, the lambda calculations of lnlik and p values are not possible, but I wasn't able to identify what causes it. With fewer taxa there are no problems. Even with a simple simulation the same 23 warnings are reported:
> library(ape)
> tree<-rtree(200)
> x<-rTraitCont(tree)
> s.lambda<-phylosig(tree,x,method="lambda",test=TRUE)
There were 23 warnings (use warnings() to see them)

When I tried to reproduce Diego's error, I could do so easily:

> source("phylosig.R")
> tree<-rtree(200)
> x<-rTraitCont(tree)
> test<-phylosig(tree,x,method="lambda")
There were 14 warnings (use warnings() to see them)
> test
[1] 1.091754
[1] 87.0783
> warnings()
Warning messages:
1: In optimize(f = likelihood, interval = c(0, maxLambda), ... :
NA/Inf replaced by maximum positive value
2: In ...

I quickly suspected that the problem was with the following calculation in the expression for the log-likelihood:


This is the calculation of the logarithm of the determinant of the evolutionary rate × the lambda-transformed VCV matrix for the tree. The problem is due to the fact that for moderately large trees det(sig2*vcv(tree) can very easily escape the range of numerical precision of R (i.e., it will become infinitesimal or very large) and thus will be set to 0 or Inf respectively. This will make log(det(...)) Inf or -Inf.

Fortunately, there is a second function to compute the matrix determinant in R, determinant(), and the advantage of this function is that it has an option logarithm, which if set to TRUE returns the logarithm of the determinant. This is perfect for us as we are computing the log-likelihood anyway and this will allow us to compute the log-determinant for matrices in which log(det(...)) will not evaluate.

Indeed, substituting determinant(...,logarithm+TRUE) for log(det(...)) seems to solve our problem from before:

> source("phylosig.R") # updated version
> test<-phylosig(tree,x,method="lambda")
> test
[1] 0.9766446
[1] 156.8490

The new version of phylosig() is now available on my R-phylogenetics page (direct link here).

1 comment:

  1. Hi Liam, thanks for phytools.

    Right now I'm trying to calculate both Pagel's lambda and Blomberg K with the phylosig function. I have the phytools version updated for R 4.0.5 and, still found the same error you describe here.

    I have a tree with 81 tips of the study group and 7 discrete morphological data in a table, part of the data is proccessed, the other part gives the same warnings of erros in the begining of the post, With the same indicators (Inf and -Inf) in the log likelihoods.

    If possible, could you please take a look in the data? and give me some indication of what could I do to solve this problem?

    Thanks a lot,

    Tibério Burlamaqui


Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.