Thursday, September 18, 2014

Update to sim.history including sensible error handling

For a long time, the phytools function sim.history, which simulates a stochastic character history under a model, has (backwardly, I'm afraid to say) taken as input a transition matrix Q which is the transpose of the Q matrix returned by make.simmap in phytools or, for that matter, fitDiscrete in geiger or ace in the ape package. That is to say, the transition rate fro i to j is given by Qj,i rather than Qi,j. The reason for this is not sensible - I just couldn't find documentation of any consistent standard when I programmed the method & thus did it one way, and not the other. This is of no consequence in the functioning of the method; however (if unanticipated) it could lead to a surprising result (that is, if we thought we were modeling Q and were in fact modeling QT).

To address this, I could simply switch convention - but then I risk affecting any functions or scripts dependent on earlier versions of sim.history. Instead, I have updated sim.history so that it now reports this “feature” whenever it is given an assymetric Q and, furthermore, if the rows instead of the columns of Q sum to zero - implying the user has attempted to supply Q in row rather than column order - sim.history internally transposes Q and warns the user.

So, for instance:

library(phytools)
## Loading required package: ape
## Loading required package: maps
packageVersion("phytools")
## [1] '0.4.34'
tree<-pbtree(n=100,scale=1)
Q<-matrix(c(-1,1,1e-12,-1e-12),2,2)
rownames(Q)<-colnames(Q)<-letters[1:2]
Q
##    a      b
## a -1  1e-12
## b  1 -1e-12
## returns a note but works fine
mtree<-sim.history(tree,Q,anc="a")
## Note - the rate of substitution from i->j should be given by Q[j,i].
## Done simulation(s).
colors<-setNames(c("blue","red"),letters[1:2])
plotSimmap(ladderize.simmap(mtree),colors,ftype="off")
add.simmap.legend(colors=colors,prompt=FALSE,x=0.1,y=10)

plot of chunk unnamed-chunk-1

## transpose Q
Q<-t(Q)
Q
##        a      b
## a -1e+00  1e+00
## b  1e-12 -1e-12
## returns a note & transposes Q
mtree<-sim.history(tree,Q,anc="a")
## Note - the rate of substitution from i->j should be given by Q[j,i].
## Detecting that rows, not columns, of Q sum to zero :
##   Transposing Q for internal calculations.
## Done simulation(s).
plotSimmap(ladderize.simmap(mtree),colors,ftype="off")
add.simmap.legend(colors=colors,prompt=FALSE,x=0.1,y=10)

plot of chunk unnamed-chunk-1

## finally, make Q "badly conformed"
Q<-t(Q)
Q[2,1]<-0.8
Q
##      a      b
## a -1.0  1e-12
## b  0.8 -1e-12
## returns a warning & adjusts diagonal of Q to conform
mtree<-sim.history(tree,Q,anc="a")
## Note - the rate of substitution from i->j should be given by Q[j,i].
## Some columns (or rows) of Q don't sum to 0.0. Fixing.
## Done simulation(s).
plotSimmap(ladderize.simmap(mtree),colors,ftype="off")
add.simmap.legend(colors=colors,prompt=FALSE,x=0.1,y=10)

plot of chunk unnamed-chunk-1

## finally, if we have supplied a symmetric matrix then
## there is no messaging
Q<-matrix(c(-1,1,1,-1),2,2)
rownames(Q)<-colnames(Q)<-letters[1:2]
mtree<-sim.history(tree,Q,anc="a")
## Done simulation(s).
plotSimmap(ladderize.simmap(mtree),colors,ftype="off")
add.simmap.legend(colors=colors,prompt=FALSE,x=0.1,y=10)

plot of chunk unnamed-chunk-1

That's it. The updated version of sim.history is here but you can also download a new phytools build with this update (and updated, clearer, documentation for sim.history) here.

On the loss of phylogenetic diversity when extinction risks evolves under the threshold model

So, for a little while (though highly intermittently) I've been exploring the possibility of modeling contemporary extinction risk, specifically IUCN threat status, using the threshold model from evolutionary quantitative genetics.

In so doing, I started considering the possibility that phylogenetic signal in extinction risk could be used to make some kind of general prediction about the loss of phylogenetic diversity (PD) under conditions in which species highly vulnerable to extinction are in fact lost. Specifically in the context of the threshold model, if our process for evolution of the liability in extinction risk results in high phylogenetic signal (for instance, Brownian motion) then we would expect that the loss of PD will be high for a given extinction fraction (because whole clades will be lost). Conversely, if the process of evolution of the liability in extinction risk results in low phylogenetic signal, then we would expect the loss of PD for a given extinction fraction to be relatively low.

To illustrate this, I wrote a little bit of code that simulates extinction risk under contemporary anthropogenic conditions using the threshold model and birth-death phylogenetic simulations. We supply a phylogeny, an extinction fraction, & a particular value for λ, and then the function returns the relative PD of the tree in which vulnerable species have gone extinct compared to PD in the original tree.

library(phytools)
## phytools internal lambdaTree function
lambdaTree<-phytools:::lambdaTree

## custom function to simulate the loss of PD
pd.lambda<-function(tree=NULL,lambda=1,fraction=0.2){
    if(is.null(tree)) tree<-pbtree(n=100,scale=1)
    x<-fastBM(lambdaTree(tree,lambda))
    th<-sort(x)[round(fraction*length(x))]
    x<-sapply(x,threshState,thresholds=
        setNames(c(th,Inf),c("extinct","extant")))
    tips<-names(which(x=="extinct"))
    sum(drop.tip(tree,tips)$edge.length)/sum(tree$edge.length)
}

## simulations for various Pagel's lambda
lambda<-0:10/10
trees<-pbtree(n=100,scale=1,nsim=100)
PD<-sapply(lambda,function(l,tt) sapply(tt,
    pd.lambda,lambda=l,fraction=0.5),tt=trees)
colnames(PD)<-lambda
boxplot(PD,ylab="relative PD",xlab="lambda")

plot of chunk unnamed-chunk-1

That's with an extinct fraction of 0.5. Let's try some other extinction fractions:

## extinction fraction of 0.2
PD<-sapply(lambda,function(l,tt) sapply(tt,
    pd.lambda,lambda=l,fraction=0.2),tt=trees)
colnames(PD)<-lambda
boxplot(PD,ylab="relative PD",xlab="lambda")

plot of chunk unnamed-chunk-2

## extinction fraction of 0.8
PD<-sapply(lambda,function(l,tt) sapply(tt,
    pd.lambda,lambda=l,fraction=0.8),tt=trees)
colnames(PD)<-lambda
boxplot(PD,ylab="relative PD",xlab="lambda")

plot of chunk unnamed-chunk-2

Basically, as we predicted, when the evolutionary process for the evolution of extinction liability under contemporary circumstances results in high phylogenetic signal then loss of PD for a given extinction fraction is high. Conversely, for low phylogenetic signal loss of PD is also relatively low.

By implication, if IUCN threat status modeled on a phylogeny has high phylogenetic signal for a given clade, then the potential loss of PD given an extinction scenario in which the species identified as most vulnerable are lost will be high relative to the expected loss of PD under the same conditions if phylogenetic signal in IUCN threat status was relatively small.

That's pretty much it.

Friday, September 5, 2014

New function to ladderize tree with mapped discrete character

I just wrote a new utility function, ladderize.simmap, that can be used to “ladderize” a phylogenetic tree with a mapped discrete character. This function is analogous to the function ladderize in the ape package - in fact, it uses ladderize internally.

All ladderization does is rotate internal nodes to achieve an effect whereby right (or left) daughter clades tend to be more species rich than left daughter clades. Since node rotation is arbitrary, this can be done without any change in the phylogenetic information contained by the tree. The visual effect of ladderization will be strongest when the tree is highly imbalanced.

So, for instance, here is a ladderization of a simple pure-birth tree:

library(phytools)
packageVersion("phytools")
## [1] '0.4.33'
set.seed(890)
tree<-pbtree(n=200,scale=1)
plotTree(tree,ftype="off")

plot of chunk unnamed-chunk-1

plotTree(ladderize(tree),ftype="off")

plot of chunk unnamed-chunk-1

Now, using a new version of phytools (phytools_0.4-33) let's try ladderizing an object with a mapped discrete character. In this case, we'll just create it using sim.history to simulate a discrete character history on the tree:

Q<-matrix(c(-2,1,1,1,-2,1,1,1,-2),3,3)
rownames(Q)<-colnames(Q)<-letters[1:3]
mtree<-sim.history(tree,Q)
plotSimmap(mtree,ftype="off")
## no colors provided. using the following legend:
##        a        b        c 
##  "black"    "red" "green3"

plot of chunk unnamed-chunk-2

plotSimmap(ladderize.simmap(mtree),ftype="off")
## no colors provided. using the following legend:
##        a        b        c 
##  "black"    "red" "green3"

plot of chunk unnamed-chunk-2

Something else that's kind of cool is that we can also use the same method to ladderize objects of class "densityMap" and "contMap" since the tree is stored in the same way internally.

So, for instance:

set.seed(1)
x<-fastBM(tree)
obj<-contMap(tree,x,plot=FALSE)
plot(obj,ftype="off",lwd=3,outline=FALSE)

plot of chunk unnamed-chunk-3

obj$tree<-ladderize.simmap(obj$tree)
plot(obj,ftype="off",lwd=3,outline=FALSE)

plot of chunk unnamed-chunk-3

This could be especially handy if our tree was crowding the color legend in the bottom left corner. (Of course, we can also move that, but this is easier).

That's it!