Sunday, June 5, 2022

Follow-up on the sensitivity of Blomberg's K to terminal edge lengths of the tree

Yesterday I documented in a post on this blog how a seemingly very small change to the edge lengths of our tree could have a very large effect on our measurement of phylogenetic signal using Blomberg et al.'s K statistic.

Because I used a very small tree (ten tips) in the example, a Twitter follower, Gustavo Paterno, asked the following, very natural question: [Does] the issue become less sensitive with increasing sample size?

Unfortunately, the answer is an emphatic NO. Let's see.

Here I'll simulate a tree with 100 taxa, I'll simulate a trait on that tree under Brownian motion, and then, using code that's similar to what I posted yesterday, I'll shorten all the tips of the tree by slightly less than half the distance between the two most closely related sister taxa.

First, here are my trees:

library(phytools)
t1<-pbtree(n=100)
## compute patristic distance matrix
D<-as.dist(cophenetic(t1))
## find distance to cut
slice<-min(D)/2-1e-12
t2<-treeSlice(t1,max(nodeHeights(t1))-slice,
    orientation="rootwards")
par(mfrow=c(1,2))
plotTree(t1,lwd=1,fsize=0.5)
plotTree(t2,lwd=1,fsize=0.5)

plot of chunk unnamed-chunk-2

Simulate a single character on tree1.

x<-fastBM(t1)
head(x)
##                t18                t19                t75                t76 
##  1.705892864344912  1.866598869750947 -0.197770768667297 -0.692227700756325 
##                t38                 t1 
## -0.198408749443290 -0.108875374497454

Now compute phylogenetic signal using Blomberg et al.'s K on each of the two trees.

phylosig(t1,x,test=TRUE)
## 
## Phylogenetic signal K : 1.08337 
## P-value (based on 1000 randomizations) : 0.001
phylosig(t2,x,test=TRUE)
## 
## Phylogenetic signal K : 0.0000537938 
## P-value (based on 1000 randomizations) : 0.014

As before, we see that λ is pretty much unaffected.

phylosig(t1,x,method="lambda",test=TRUE)
## 
## Phylogenetic signal lambda : 1.0005 
## logL(lambda) : -123.238 
## LR(lambda=0) : 161.014 
## P-value (based on LR test) : 6.79424e-37
phylosig(t2,x,method="lambda",test=TRUE)
## 
## Phylogenetic signal lambda : 0.999827 
## logL(lambda) : -123.238 
## LR(lambda=0) : 161.014 
## P-value (based on LR test) : 6.79424e-37

One trick that could help us understand this apparent contradiction a bit better is to see what happens when we transform our second tree by the MLE of λ for that tree, and then compute K on the transformed tree.

Let's check.

lambda2<-phylosig(t2,x,method="lambda")$lambda
phylosig(phytools:::lambdaTree(t2,lambda2),x)
## 
## Phylogenetic signal K : 1.06701

What? Ha.

2 comments:

  1. Hello,
    I am finding the same issue in my analyses (1414 species, tree made with V.PhyloMaker2 i.e. pruned from megatree built based on molecular data), ex. lambda = 0.819191 and K = 0.0520079 for the same trait and tree, with both having p <<< 0.05. However, once I transform the tree by its MLE of lambda, K = 0.699543, similar to the result at the end.

    I presume K from the adjusted tree be a more valid representation of the true value, but I have trouble understanding what running a phylogenetic signal analysis (K) on a phylogenetic signal - adjusted tree (MLE of lambda) really means.

    ReplyDelete
  2. Hi,
    I am finding the same thing in my data (1414 species, phylogeny from V.PhyloMaker2 - pruned from megatree built based on molecular data).
    on the untransformed yet ultrametric tree, lambda = 0.82 and K = 0.05 for the same trait, both with p <<0.05.
    When I adjust the tree by the MLE of lambda as shown above, K is now 0.70.
    I presume the adjusted K to be a more accurate representation of the true K value, however I am having trouble understanding what this represents. What does running a phylogenetic signal analysis (K) on a phylogenetic signal - adjusted (lambda) tree mean?

    ReplyDelete

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.