A phytools user by the name of
Jurriaan de Vos
recently identified an important bug in the phytools function `evol.vcv`

.
`evol.vcv`

implements the method of Revell & Collar
(2009) for fitting
two or more among-trait evolutionary covariance matrices for the Brownian process
to different parts of the tree. The problem stems from a bug introduced in which I
inadvertently assumed that the order of the input data matrix matched the order of
the tip labels in `tree`

.

Here is an example of the problem.

First, I'll simulate data with different correlation structure in different parts of the tree:

```
library(phytools)
tree<-pbtree(n=26,tip.label=LETTERS,scale=2)
Q<-matrix(c(-1,1,1,-1),2,2)
rownames(Q)<-colnames(Q)<-letters[1:2]
tree<-sim.history(tree,Q,anc="a")
```

```
## Done simulation(s).
```

```
plotSimmap(tree,colors=setNames(c("blue","red"),letters[1:2]),ylim=c(-1,27))
add.simmap.legend(colors=setNames(c("blue","red"),letters[1:2]),prompt=FALSE,
x=0.1*max(nodeHeights(tree)),y=0,vertical=FALSE)
```

```
vcv<-list(matrix(c(1,0,0,1),2,2),
matrix(c(2,1.8,1.8,2),2,2))
names(vcv)<-letters[1:2]
vcv
```

```
## $a
## [,1] [,2]
## [1,] 1 0
## [2,] 0 1
##
## $b
## [,1] [,2]
## [1,] 2.0 1.8
## [2,] 1.8 2.0
```

```
X<-sim.corrs(tree,vcv=vcv)
```

```
phylomorphospace(tree,X,colors=setNames(c("blue","red"),letters[1:2]),
xlab="trait 1",ylab="trait 2")
add.simmap.legend(colors=setNames(c("blue","red"),letters[1:2]),prompt=FALSE,
x=par()$usr[1]+0.02*(par()$usr[2]-par()$usr[1]),
y=par()$usr[4]-0.04*(par()$usr[4]-par()$usr[3]))
```

OK, now let's fit the models:

```
fit1<-evol.vcv(tree,X)
fit1
```

```
## ML single-matrix model:
## R[1,1] R[1,2] R[2,2] k log(L)
## fitted 1.1928 0.7186 0.9892 5 -57.9027
##
## ML multi-matrix model:
## R[1,1] R[1,2] R[2,2] k log(L)
## a 0.3866 -0.1434 0.4604 8 -53.3203
## b 1.9253 1.467 1.4903
##
## P-value (based on X^2): 0.0272
##
## R thinks it has found the ML solution.
```

```
## rows scrambled:
fit2<-evol.vcv(tree,X[sample(rownames(X)),])
fit2
```

```
## ML single-matrix model:
## R[1,1] R[1,2] R[2,2] k log(L)
## fitted 1.1928 0.7186 0.9892 5 -131.4336
##
## ML multi-matrix model:
## R[1,1] R[1,2] R[2,2] k log(L)
## a 0.9387 0.6829 1.4885 8 -86.0687
## b 5.4343 2.7888 4.7515
##
## P-value (based on X^2): 0
##
## R thinks it has found the ML solution.
```

There are obviously multiple weird things going on here. Firstly, the single-matrix model is fit correctly, though the log-likelihood is obviously totally wrong. Secondly, the two-matrix models are completely messed up….

Interestingly, the bug does not affect `evolvcv.lite`

which fits the same
model - as well as other ones - but only for datasets with just two continuous traits.
(Unfortunately, there is no fancy `print`

method for `evolvcv.lite`

).

```
fit3<-evolvcv.lite(tree,X[sample(rownames(X)),])
fit3$model1
```

```
## $description
## [1] "common rates, common correlation"
##
## $R
## [,1] [,2]
## [1,] 1.1927796 0.7185548
## [2,] 0.7185548 0.9891478
##
## $logLik
## [1] -57.90274
##
## $convergence
## [1] 0
##
## $k
## [1] 5
##
## $AIC
## [1] 125.8055
```

```
fit3$model4
```

```
## $description
## [1] "no common structure"
##
## $R
## $R$a
## [,1] [,2]
## [1,] 0.3862680 -0.1435202
## [2,] -0.1435202 0.4596485
##
## $R$b
## [,1] [,2]
## [1,] 1.924823 1.466919
## [2,] 1.466919 1.490986
##
##
## $logLik
## [1] -53.32028
##
## $convergence
## [1] 0
##
## $k
## [1] 8
##
## $AIC
## [1] 122.6406
```

The fix - also pointed out by
Jurriaan is
extremely simple & merely involved moving one line of code (in which `X`

)
is vectorized two lines below (to be below where the rows of `X`

were
already sorted)! I have posted a new version of `evol.vcv`

with this
fix implemented here
and will obviously be in the next update to phytools.

Here's a demo showing that it is fixed:

```
source("http://www.phytools.org/evol.vcv/v0.7/evol.vcv.R")
library(numDeriv)
fit4<-evol.vcv(tree,X[sample(rownames(X)),])
fit4
```

```
## ML single-matrix model:
## R[1,1] R[1,2] R[2,2] k log(L)
## fitted 1.1928 0.7186 0.9892 5 -57.9027
##
## ML multi-matrix model:
## R[1,1] R[1,2] R[2,2] k log(L)
## a 0.3866 -0.1434 0.4604 8 -53.3203
## b 1.9253 1.467 1.4903
##
## P-value (based on X^2): 0.0272
##
## R thinks it has found the ML solution.
```

That's it.

## No comments:

## Post a Comment