Tuesday, December 13, 2011

Computing phylogenetic signal across multiple trees

A user just inquired as to whether there was an easy way to compute phylogenetic signal across multiple trees (say, a set of trees sampled from the posterior distribution or a set of equally parsimonious trees). In fact, this is quite easy.

To illustrate this, I will first simulate a set of 10 trees, and then I will generate data on one of these. This is intended to replicate our typical data analysis pipeline in which we have multiple trees but one vector of phenotypic trait values for species.

> require(phytools) # load phytools & ape
> trees<-rmtree(10,50) # 10 random trees
> x<-fastBM(trees[[1]])

If we are reading data from file, x should be a vector with the phenotypic trait values for species in which names(x) contains the tip names in the tree.

Now, we can use the base function sapply() to get a vector of K values for each tree in our set (of course, in this example, all the trees except the 1st are random with respect to the phenotypic data in x).

> K<-sapply(trees,phylosig,x=x)
> K
[1] 0.8565636 0.2462468 0.1695000 0.1727985 0.2410287 0.2135242
[7] 0.2088892 0.1913755 0.2381923 0.1880249

We can also compute P-values if we want, using the same general approach:

> K<-sapply(trees,phylosig,x=x,test=T)
> K
  [,1]      [,2]      [,3]   [,4]      [,5]      [,6]     
K 0.8565636 0.2462468 0.1695 0.1727985 0.2410287 0.2135242
P 0.001     0.154     0.637  0.692     0.423     0.303    
  [,7]      [,8]      [,9]      [,10]    
K 0.2088892 0.1913755 0.2381923 0.1880249
P 0.856     0.983     0.301     0.852

Here, sapply() just wraps our results into two rows. Of course, the same logic also applies to method="lambda", for instance:

> Lambda<-sapply(trees,phylosig,x=x,test=T,method="lambda")
> Lambda
       [,1]         [,2]         [,3]         [,4]        
lambda 0.9811238    5.470023e-05 6.856524e-05 6.894409e-05
logL   -78.65561    -105.1774    -100.7317    -100.9566   
P      6.506385e-10 1            1            1           
       [,5]         [,6]       [,7]         [,8]        
lambda 7.387218e-05 0.3517811  7.669448e-05 4.969137e-05
logL   -99.49018    -99.61256  -100.3459    -103.9276   
P      1            0.01121933 1            1           
       [,9]         [,10]       
lambda 7.314114e-05 7.700847e-05
logL   -101.0983    -103.7509   
P      1            1 

Good question. Thanks!


  1. BTW, I meant to say that we would normally read in our set of trees using read.tree or read.nexus; or we would estimate them using the phangorn library. The random trees in this analysis are included merely for illustrative purposes.

  2. Thanks Liam for the details. I realized that my problem was that I had some terminal branches with zero lengths. So I was able to obtain the K values after transforming those zero lenghts to 0.1
    I hope this is a valid trick

  3. Hey Liam,

    Thanks for offering all the great R code to do different analyses. One question I had was how to calculate lambda on multiple traits? I have code that gets me close in that it outputs the lambda values but not the log values or p-value. I am not the most savvy R user so any help would be great.

    Here is what I have so far:

    for (i in 1:6) Bark_lambda[i] = phylosig(poptree, Bark_for_model[[i]], method = "lambda", test = TRUE)


    1. Hi James - I just posted about this, here. Thanks for the question. Liam