A phytools user asked today:

*“A reviewer asked us to test if phylogenetic signal is equal to 1, but
I have not found any package that does that. Do you have any suggestion about
how to do this?*

Since phylogenetic signal (measured using either Pagel's λ or
Blomberg's *K*) has an expected value of 1.0 under Brownian motion;
testing whether signal is different from 1.0 is essentially testing whether
phylogenetic signal is smaller or greater than if the data arose by a Brownian
process.

We can do this with a likelihood-ratio test (with some caveats) if we are
using Pagel's λ to measure phylogenetic signal. If we are using
Blomberg's *K* then we can use simulation (with fewer caveats).

Here's a quick demo.

First, let's simulate some data with & without signal. For the no-signal data I will just randomly permute the Brownian data:

```
library(phytools)
## tree
tree<-pbtree(n=26,tip.label=LETTERS)
## data with signal
x1<-fastBM(tree)
## data without signal
x2<-setNames(sample(x1),names(x1))
```

Now let's estimate Pagel's lambda for both datasets. In each case, we will compare the likelihood of the fitted model to the likelihood of λ = 1.

```
## Pagel's lambda
## first the data with signal
lam1<-phylosig(tree,x1,method="lambda")
lam1
```

```
## $lambda
## [1] 0.9413
##
## $logL
## [1] -38.39
```

```
fitBrownian<-brownie.lite(paintSubTree(tree,tree$edge[1,1],state="1"),x1)
fitBrownian
```

```
## ML single-rate model:
## s^2 se a k logL
## value 1.1035 0.3061 -0.2291 2 -38.6817
##
## ML multi-rate model:
## s^2(1) se(1) a k logL
## value 1.1035 -0.2291 2 -38.6817
##
## P-value (based on X^2): 1
##
## R thinks it has found the ML solution.
```

```
## likelihood ratio test
LR<-2*(lam1$logL-fitBrownian$logL1)
LR
```

```
## [1] 0.5774
```

```
P.lr<-pchisq(LR,df=1,lower.tail=FALSE)
P.lr
```

```
## [1] 0.4473
```

```
## now the data without signal
lam2<-phylosig(tree,x2,method="lambda")
lam2
```

```
## $lambda
## [1] 6.917e-05
##
## $logL
## [1] -46.33
```

```
fitBrownian<-brownie.lite(paintSubTree(tree,tree$edge[1,1],state="1"),x2)
fitBrownian
```

```
## ML single-rate model:
## s^2 se a k logL
## value 4.5548 1.2633 0.0452 2 -57.112
##
## ML multi-rate model:
## s^2(1) se(1) a k logL
## value 4.5548 1 0.0452 2 -57.112
##
## P-value (based on X^2): 1
##
## Optimization may not have converged.
```

```
## likelihood ratio test
LR<-2*(lam2$logL-fitBrownian$logL1)
LR
```

```
## [1] 21.56
```

```
P.lr<-pchisq(LR,df=1,lower.tail=FALSE)
P.lr
```

```
## [1] 3.428e-06
```

Now let's do roughly the same thing for Blomberg's *K*. In this case,
however, we will use simulation by Brownian motion to get a null distribution
for *K*.

```
## Blomberg's K
## first the data with signal
K1<-phylosig(tree,x1)
K1
```

```
## [1] 0.9221
```

```
## fit a Brownian model to the data to get parameters
## for simulation
fitBrownian<-brownie.lite(paintSubTree(tree,tree$edge[1,1],state="1"),x1)
X<-fastBM(tree,sig2=fitBrownian$sig2.single,a=fitBrownian$a.single,nsim=999)
nullK<-apply(X,2,phylosig,tree=tree)
mean(nullK)
```

```
## [1] 0.9857
```

```
hist(nullK,breaks=20,xlab="null distribution for K",
main="Null distribution of K")
lines(c(K1,K1),c(0,par()$usr[4]),lty="dashed",col="red")
text(x=K1,y=0.985*par()$usr[4],"observed value of K",pos=4,offset=0.2)
```

The distribution is of *K* under the null hypothesis is highly
assymetric, so it might make sense to conduct a test of *K* using
`abs(log(K))`

. (Just an idea.)

```
P.k<-mean(abs(log(c(K1,nullK)))>=abs(log(K1)))
P.k
```

```
## [1] 0.849
```

Now let's do the same with the no-signal data:

```
K2<-phylosig(tree,x2)
K2
```

```
## [1] 0.2261
```

```
## fit a Brownian model to the data to get parameters
## for simulation
fitBrownian<-brownie.lite(paintSubTree(tree,tree$edge[1,1],state="1"),x2)
X<-fastBM(tree,sig2=fitBrownian$sig2.single,a=fitBrownian$a.single,nsim=999)
nullK<-apply(X,2,phylosig,tree=tree)
mean(nullK)
```

```
## [1] 0.9844
```

```
hist(nullK,breaks=20,xlab="null distribution for K",
main="Null distribution of K")
lines(c(K2,K2),c(0,par()$usr[4]),lty="dashed",col="red")
text(x=K2,y=0.985*par()$usr[4],"observed value of K",pos=4,offset=0.2)
```

```
P.k<-mean(abs(log(c(K2,nullK)))>=abs(log(K2)))
P.k
```

```
## [1] 0.001
```

That's it.

Note that a couple of caveats apply - particularly to testing hypothesis about λ using the likelihood ratio:

ReplyDelete(1) The likelihood-ratio test, in which we use a Χ^2distribution for hypothesis testing, depends on asymptotic properties of likelihoods which do not hold for small sample sizes (and small trees).

(2) There is also a problem when we are at the bounds of a particular parameter - and this will be true near 0.0 & 1.0 for λ.

For more details, see Carl Boettiger's paper which I believe can be found here.

This comment has been removed by the author.

Deletethanks Liam. I was able to run this for most variables but in some cases I got this error when applying the brownie.lite function:

ReplyDeleteError in solve.default(model1$hessian) :

Lapack routine dgesv: system is exactly singular: U[2,2] = 0

Do you know what could be the problem?

Marcelo

Hola Marcelo.

DeleteSince this looks like it is trying to invert the Hessian matrix, it may just be involved in computing standard errors for the parameter estimates (and not the optimization). Did the function still return an optimized λ in this case.

Saludos, Liam

Nope, it did not. Thanks Liam!

ReplyDelete