Tuesday, July 28, 2015

Important bug fix in evol.vcv

A phytools user by the name of Jurriaan de Vos recently identified an important bug in the phytools function evol.vcv. evol.vcv implements the method of Revell & Collar (2009) for fitting two or more among-trait evolutionary covariance matrices for the Brownian process to different parts of the tree. The problem stems from a bug introduced in which I inadvertently assumed that the order of the input data matrix matched the order of the tip labels in tree.

Here is an example of the problem.

First, I'll simulate data with different correlation structure in different parts of the tree:

library(phytools)
tree<-pbtree(n=26,tip.label=LETTERS,scale=2)
Q<-matrix(c(-1,1,1,-1),2,2)
rownames(Q)<-colnames(Q)<-letters[1:2]
tree<-sim.history(tree,Q,anc="a")
## Done simulation(s).
plotSimmap(tree,colors=setNames(c("blue","red"),letters[1:2]),ylim=c(-1,27))
add.simmap.legend(colors=setNames(c("blue","red"),letters[1:2]),prompt=FALSE,
    x=0.1*max(nodeHeights(tree)),y=0,vertical=FALSE)

plot of chunk unnamed-chunk-1

vcv<-list(matrix(c(1,0,0,1),2,2),
    matrix(c(2,1.8,1.8,2),2,2))
names(vcv)<-letters[1:2]
vcv
## $a
##      [,1] [,2]
## [1,]    1    0
## [2,]    0    1
## 
## $b
##      [,1] [,2]
## [1,]  2.0  1.8
## [2,]  1.8  2.0
X<-sim.corrs(tree,vcv=vcv)
phylomorphospace(tree,X,colors=setNames(c("blue","red"),letters[1:2]),
    xlab="trait 1",ylab="trait 2")
add.simmap.legend(colors=setNames(c("blue","red"),letters[1:2]),prompt=FALSE,
    x=par()$usr[1]+0.02*(par()$usr[2]-par()$usr[1]),
    y=par()$usr[4]-0.04*(par()$usr[4]-par()$usr[3]))

plot of chunk unnamed-chunk-2

OK, now let's fit the models:

fit1<-evol.vcv(tree,X)
fit1
## ML single-matrix model:
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)
## fitted   1.1928  0.7186  0.9892  5   -57.9027    
## 
## ML multi-matrix model:
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)
## a    0.3866  -0.1434 0.4604  8   -53.3203    
## b    1.9253  1.467   1.4903  
## 
## P-value (based on X^2): 0.0272 
## 
## R thinks it has found the ML solution.
## rows scrambled:
fit2<-evol.vcv(tree,X[sample(rownames(X)),])
fit2
## ML single-matrix model:
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)
## fitted   1.1928  0.7186  0.9892  5   -131.4336   
## 
## ML multi-matrix model:
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)
## a    0.9387  0.6829  1.4885  8   -86.0687    
## b    5.4343  2.7888  4.7515  
## 
## P-value (based on X^2): 0 
## 
## R thinks it has found the ML solution.

There are obviously multiple weird things going on here. Firstly, the single-matrix model is fit correctly, though the log-likelihood is obviously totally wrong. Secondly, the two-matrix models are completely messed up….

Interestingly, the bug does not affect evolvcv.lite which fits the same model - as well as other ones - but only for datasets with just two continuous traits. (Unfortunately, there is no fancy print method for evolvcv.lite).

fit3<-evolvcv.lite(tree,X[sample(rownames(X)),])
fit3$model1
## $description
## [1] "common rates, common correlation"
## 
## $R
##           [,1]      [,2]
## [1,] 1.1927796 0.7185548
## [2,] 0.7185548 0.9891478
## 
## $logLik
## [1] -57.90274
## 
## $convergence
## [1] 0
## 
## $k
## [1] 5
## 
## $AIC
## [1] 125.8055
fit3$model4
## $description
## [1] "no common structure"
## 
## $R
## $R$a
##            [,1]       [,2]
## [1,]  0.3862680 -0.1435202
## [2,] -0.1435202  0.4596485
## 
## $R$b
##          [,1]     [,2]
## [1,] 1.924823 1.466919
## [2,] 1.466919 1.490986
## 
## 
## $logLik
## [1] -53.32028
## 
## $convergence
## [1] 0
## 
## $k
## [1] 8
## 
## $AIC
## [1] 122.6406

The fix - also pointed out by Jurriaan is extremely simple & merely involved moving one line of code (in which X) is vectorized two lines below (to be below where the rows of X were already sorted)! I have posted a new version of evol.vcv with this fix implemented here and will obviously be in the next update to phytools.

Here's a demo showing that it is fixed:

source("http://www.phytools.org/evol.vcv/v0.7/evol.vcv.R")
library(numDeriv)
fit4<-evol.vcv(tree,X[sample(rownames(X)),])
fit4
## ML single-matrix model:
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)
## fitted   1.1928  0.7186  0.9892  5   -57.9027    
## 
## ML multi-matrix model:
##  R[1,1]  R[1,2]  R[2,2]  k   log(L)
## a    0.3866  -0.1434 0.4604  8   -53.3203    
## b    1.9253  1.467   1.4903  
## 
## P-value (based on X^2): 0.0272 
## 
## R thinks it has found the ML solution.

That's it.

No comments:

Post a Comment