A colleague recently inquired as to how one might go about interpreting
the pattern shown by a so-called *disparity-through-time* (DTT) plot.

A DTT plot shows the mean disparity (effectively, variance) of each subtree compared to the total disparity, and is calculated by taking a time slice at each 'event' (speciation event recorded in the reconstructed phylogeny) of the tree.

DTT plots are interpreted in a variety of ways in the literature - for instance, as evidence that variability accumulates within, rather than between, clades; or that phenotypic diversity has tended to accrue in the early rather than the later stages of a diversification.

I thought it might be nonetheless useful to conduct a 'mini' simulation study in which I examined how, in general, the accumulation of disparity among clades through time may be related to biological process.

To start with, here are a couple of small custom functions that I wrote to facilitate the analyses:

```
meanDtt<-function(D){
max.t<-max(sapply(D,function(x) max(x$times)))
t<-seq(0,1,by=0.01)*max.t
d<-vector()
for(i in 1:length(t)){
d[i]<-0
for(j in 1:length(D)){
ii<-max(which(D[[j]]$times<=t[i]))
d[i]<-d[i]+D[[j]]$dtt[ii]/length(D)
}
}
obj<-list(times=t,dtt=d)
obj
}
plotDtt<-function(D,main=""){
max.d<-max(sapply(D,function(x) max(x$dtt)))
max.t<-max(sapply(D,function(x) max(x$times)))
plot.new()
plot.window(xlim=c(0,max.t),ylim=c(0,max.d))
for(i in 1:length(D)) lines(D[[i]]$times,D[[i]]$dtt,
col=rgb(0,0,1,0.1),lwd=2,
type="s")
axis(1)
axis(2)
title(xlab="time",ylab="disparity",main=main)
obj<-meanDtt(D)
lines(c(0,obj$times),c(1,obj$dtt),lwd=2)
}
```

Next, we can load our packages. DTT calculation is part of the geiger package.

```
library(phytools)
```

```
## Loading required package: ape
## Loading required package: maps
##
## # ATTENTION: maps v3.0 has an updated 'world' map. #
## # Many country borders and names have changed since 1990. #
## # Type '?world' or 'news(package="maps")'. See README_v3. #
```

```
library(geiger)
```

OK, now we can simulate some trees:

```
trees<-pbtree(n=100,scale=1,nsim=200)
trees
```

```
## 200 phylogenetic trees
```

OK, now let's explore the relationship between process & DTT. Our custom
function, `plotDtt`

, plots all the DTTs but also overlays the
average DTT across simulations.

First, Brownian motion:

```
## Brownian motion
X<-lapply(trees,fastBM)
D<-mapply(dtt,trees,X,MoreArgs=list(plot=FALSE),SIMPLIFY=FALSE)
plotDtt(D,main="Brownian motion")
```

Now, early-burst evolution:

```
## early burst
X<-lapply(trees,function(x) fastBM(rescale(x,"EB",a=-4)))
D<-mapply(dtt,trees,X,MoreArgs=list(plot=FALSE),SIMPLIFY=FALSE)
plotDtt(D,main="Early-Burst")
```

Late-burst:

```
## late burst
X<-lapply(trees,function(x) fastBM(rescale(x,"EB",a=4)))
D<-mapply(dtt,trees,X,MoreArgs=list(plot=FALSE),SIMPLIFY=FALSE)
plotDtt(D,main="Late-Burst")
```

Now, how about the Ornstein-Uhlenbeck model with a single optimum:

```
## OU
X<-lapply(trees,function(x) fastBM(rescale(x,"OU",alpha=4)))
D<-mapply(dtt,trees,X,MoreArgs=list(plot=FALSE),SIMPLIFY=FALSE)
plotDtt(D,main="Ornstein-Uhlenbeck")
```

In addition, here are a couple of models that we can simulate very easily, but not fit:

```
## Brownian motion with bounds
X<-lapply(trees,fastBM,bounds=c(-1,1))
D<-mapply(dtt,trees,X,MoreArgs=list(plot=FALSE),SIMPLIFY=FALSE)
plotDtt(D,main="Bounded BM [-1,1]")
```

```
X<-lapply(trees,fastBM,bounds=c(-0.4,0.4))
D<-mapply(dtt,trees,X,MoreArgs=list(plot=FALSE),SIMPLIFY=FALSE)
plotDtt(D,main="Bounded BM [-0.4,0.4]")
```

```
X<-lapply(trees,fastBM,bounds=c(-0.2,0.2))
D<-mapply(dtt,trees,X,MoreArgs=list(plot=FALSE),SIMPLIFY=FALSE)
plotDtt(D,main="Bounded BM [-0.2,0.2]")
```

```
## Brownian motion with a trend
X<-lapply(trees,fastBM,mu=4)
D<-mapply(dtt,trees,X,MoreArgs=list(plot=FALSE),SIMPLIFY=FALSE)
plotDtt(D,main="BM with a trend")
```

Finally, it is important to keep in mine that DTT plots can be pretty
substantially affected by sampling error in the estimation of species
means. Here is a demo of the effect of two different levels of sampling
error. I have simulated sampling variances from a exponential distribution
with rate 1.0, but then multiplied by different values of a constant
`k`

, in which larger values imply more sampling error:

```
## measurement error
sim.error<-function(tree,k=0.1){
x<-fastBM(tree)
v<-k*setNames(rexp(n=Ntip(tree)),names(x))
sampleFrom(x,v,n=rep(1,Ntip(tree)))
}
X<-lapply(trees,sim.error,k=0.01)
D<-mapply(dtt,trees,X,MoreArgs=list(plot=FALSE),SIMPLIFY=FALSE)
plotDtt(D,"BM with sampling error (k=0.01)")
```

```
X<-lapply(trees,sim.error,k=0.05)
D<-mapply(dtt,trees,X,MoreArgs=list(plot=FALSE),SIMPLIFY=FALSE)
plotDtt(D,"BM with sampling error (k=0.05)")
```

```
X<-lapply(trees,sim.error,k=0.2)
D<-mapply(dtt,trees,X,MoreArgs=list(plot=FALSE),SIMPLIFY=FALSE)
plotDtt(D,"BM with sampling error (k=0.2)")
```

That's really all I have.

Dear Liam, thanks for these simulations. I wonder whether "dtt" performs well also when one has multiple samples per species or whether this method is not appropriate for such datasets.

ReplyDeleteBest,

Martin

I made my first few DTT plots and I am getting values above 1 as time approaches the present. I am having trouble finding any literature explaining this / I'm having trouble wrapping my head around the actual calculations. Do you have any input related to DTT values above 1?

ReplyDelete