In this short tutorial I’ll demonstrate how to do penalized-likelihood rate-smoothing to obtain a time-calibrated ultrametric tree in R using the *ape* function `chronos`

.

This is only one way to obtain an ultrametric tree – we could also estimate an ultrametric, time-calibrated tree under a *strict molecular clock* or using a *relaxed clock* inference method, for instance in software like the program BEAST. The method of this tutorial, however, is very useful when we have an estimated tree but no nucleotide sequence alignment.

First load packages.

```
library(phytools)
library(phangorn)
```

Next, let’s use the data object packaged with *phangorn* containing a nucleotide sequence alignment for mammals, `Laurasiatherian`

.

```
data("Laurasiatherian")
Laurasiatherian
```

```
## 47 sequences with 3179 character and 1605 different site patterns.
## The states are a c g t
```

Let’s estimate a tree from these data using likelihood.

To do that I’ll get a starting tree with MP, assign random starting edge lengths, and then optimize my ML tree with `optim.pml`

.

```
mp.tree<-pratchet(Laurasiatherian)
mp.tree$edge.length<-runif(n=nrow(mp.tree$edge))
```

Next, we can create our `"pml"`

object and optimize it.

```
lik.model<-pml(mp.tree,Laurasiatherian,k=4)
ml.fit<-optim.pml(lik.model,optGamma=TRUE,optBf=TRUE,
optQ=TRUE,rearrangement="ratchet")
ml.fit
```

```
## model: F81+G(4)
## loglikelihood: -44650.31
## unconstrained loglikelihood: -17300.92
## Discrete gamma model
## Number of rate categories: 4
## Shape parameter: 0.3424146
##
## Rate matrix:
## a c g t
## a 0.000000 3.2308279 11.7912017 2.435826
## c 3.230828 0.0000000 0.6031533 23.128196
## g 11.791202 0.6031533 0.0000000 1.000000
## t 2.435826 23.1281956 1.0000000 0.000000
##
## Base frequencies:
## a c g t
## 0.3818781 0.1693982 0.1709454 0.2777784
```

I’ll now *root* the tree, using *Platypus* as my outgroup, and then prune the outgroup.

```
ml.tree<-root(ml.fit$tree,outgroup="Platypus")
ml.tree<-drop.tip(ml.tree,"Platypus")
plotTree(ml.tree,fsize=0.8)
```

We can see that this is a ML tree, but it is not time-calibrated and it isn’t ultrametric.

To obtain a time-calibrated ultrametric tree we’ll use the function `chronos`

in *ape*. `chronos`

implements the penalized likelihood method of Sanderson (2002).

To use it, we’ll first create a *calibration* data frame containing the nodes and their upper and lower temporal bounds using `makeChronosCalib`

and then we’ll pass the tree and this matrix to `chronos`

to do penalized likelihood estimation. The default model of this method is a correlated-rates model, but it also has an uncorrelated model in which we need to specify the number of rate categories.

I will use four different calibration nodes as follows:

```
nodes<-c(findMRCA(ml.tree,c("Possum","Cat")),
findMRCA(ml.tree,c("Squirrel","Mouse")),
findMRCA(ml.tree,c("Pig","BlueWhale")),
findMRCA(ml.tree,c("Human","Baboon")),
findMRCA(ml.tree,c("Horse","Donkey")))
age.min=c(159,66,59,27.95,6.2)
age.max=c(166,75,66,31.35,10)
plotTree(ladderize(ml.tree),fsize=0.8)
obj<-get("last_plot.phylo",envir=.PlotPhyloEnv)
points(obj$xx[nodes],obj$yy[nodes],pch=21,
bg=palette()[1:5],cex=2)
legend("bottomleft",paste("(",age.max,", ",
age.min,") mya",sep=""),pch=21,
pt.bg=palette()[1:5],pt.cex=2,bty="n")
```

The specific time intervals for each of these calibrations comes from http://www.timetree.org/.

Now, we use `makeChronosCalib`

to create a data frame containing our calibration points. If we want to use *fixed* calibrations, rather than intervals, we just need to set `age.min`

and `age.max`

to be equal!

```
calibration<-makeChronosCalib(ml.tree,node=nodes,
age.min=age.min,age.max=age.max)
calibration
```

```
## node age.min age.max soft.bounds
## 1 47 159.00 166.00 FALSE
## 2 49 66.00 75.00 FALSE
## 3 75 59.00 66.00 FALSE
## 4 53 27.95 31.35 FALSE
## 5 83 6.20 10.00 FALSE
```

With our calibration table in hand, we can go ahead and fit our model using penalized-likelihood as follows.

```
pl.tree<-chronos(ml.tree,calibration=calibration)
```

```
##
## Setting initial dates...
## Fitting in progress... get a first set of estimates
## (Penalised) log-lik = -17.07964
## Optimising rates... dates... -17.07964
```

```
## Warning: false convergence (8)
```

```
##
## log-Lik = -17.07905
## PHIIC = 304.16
```

```
pl.tree
```

```
##
## Chronogram
##
## Call: chronos(phy = ml.tree, calibration = calibration)
##
##
## Phylogenetic tree with 46 tips and 45 internal nodes.
##
## Tip labels:
## Wallaroo, Possum, Bandicoot, Opposum, Rabbit, Pika, ...
## Node labels:
## 1, 1, 1, 0.415, 0.792, 1, ...
##
## Rooted; includes branch lengths.
```

Let’s graph our tree.

```
plotTree(pl.tree,direction="leftwards",
xlim=c(174,-40),ftype="i",mar=c(4.1,1.1,0.1,1.1),
fsize=0.8)
axis(1)
title(xlab="millions of years before present")
abline(v=seq(0,150,by=50),lty="dotted",col="grey")
```

One nuance that I’ve ignored here is the effect of the smoothing parameter, \(\lambda\). We left it at it’s default value of `lambda=1`

.

Let’s try a different value and see the effect.

```
lambda.tree<-chronos(ml.tree,calibration=calibration,
lambda=0.1)
```

```
##
## Setting initial dates...
## Fitting in progress... get a first set of estimates
## (Penalised) log-lik = -17.07911
## Optimising rates... dates... -17.07911
##
## log-Lik = -17.07905
## PHIIC = 304.16
```

```
lambda.tree
```

```
##
## Chronogram
##
## Call: chronos(phy = ml.tree, lambda = 0.1, calibration = calibration)
##
##
## Phylogenetic tree with 46 tips and 45 internal nodes.
##
## Tip labels:
## Wallaroo, Possum, Bandicoot, Opposum, Rabbit, Pika, ...
## Node labels:
## 1, 1, 1, 0.415, 0.792, 1, ...
##
## Rooted; includes branch lengths.
```

Rather than graph our tree again, let’s just plot the *edge lengths* of our two trees, one against the other.

```
plot(pl.tree$edge.length,lambda.tree$edge.length,
pch=21,bg="grey",cex=1.2,bty="n",
xlab=expression(paste(lambda,"= 1")),
ylab=expression(paste(lambda,"= 0.1")))
lines(c(0,80),c(0,80))
legend("topleft","1:1 line",lty="solid",bty="n")
grid()
title(main=expression(paste(
"Comparison of edge lengths with two different ",
lambda," values")))
```

When we do that, we can see that (at least in this case) the effect of \(\lambda\) is not large. This is probably because we had a variety of calibration points distributed across the nodes of our phylogeny.

How do the divergence times for nodes that we did not use as calibration points compare to estimated divergence times in http://www.timetree.org/?

To check, let’s compute a matrix that gives the divergence *depth* of the MRCA of each pair of tips on the phylogeny using the *ape* function `cophenetic`

.

```
d<-cophenetic(pl.tree)/2
```

(We divide by two because `cophenetic`

gives us the *total* distance between tips.)

Let’s look at just part of this matrix:

```
tips<-c("Elephant","LongTBat","Sheep","Cow",
"Squirrel","Dormouse")
d[tips,tips]
```

```
## Elephant LongTBat Sheep Cow Squirrel Dormouse
## Elephant 0.00 107.02 107.02 107.02 116.49 116.49
## LongTBat 107.02 0.00 88.07 88.07 116.49 116.49
## Sheep 107.02 88.07 0.00 19.89 116.49 116.49
## Cow 107.02 88.07 19.89 0.00 116.49 116.49
## Squirrel 116.49 116.49 116.49 116.49 0.00 23.75
## Dormouse 116.49 116.49 116.49 116.49 23.75 0.00
```

You can do this for any pair of tips and see how it compares to the ages in http://www.timetree.org/.

Lastly, `chronos`

can also do penalized likelihood rate-smoothing without any calibration points at all. Let’s try that for comparison. (Here, I’ll use `model="relaxed"`

instead of `model="correlated"`

, the default. Full disclosure: this is because *in this instance* so doing yielded a higher correlation of the divergence times with those obtained using our calibration points!)

```
uncalib_tree<-chronos(ml.tree,lambda=1,model="relaxed")
```

```
##
## Setting initial dates...
## Fitting in progress... get a first set of estimates
## (Penalised) log-lik = -17.304
## Optimising rates... dates... -17.304
## Optimising rates... dates... -17.30103
##
## log-Lik = -17.22596
## PHIIC = 302.47
```

```
uncalib_tree
```

```
##
## Chronogram
##
## Call: chronos(phy = ml.tree, lambda = 1, model = "relaxed")
##
##
## Phylogenetic tree with 46 tips and 45 internal nodes.
##
## Tip labels:
## Wallaroo, Possum, Bandicoot, Opposum, Rabbit, Pika, ...
## Node labels:
## 1, 1, 1, 0.415, 0.792, 1, ...
##
## Rooted; includes branch lengths.
```

```
plotTree(uncalib_tree,direction="leftwards",
xlim=c(1,-0.2),ftype="i",mar=c(4.1,1.1,0.1,1.1),
fsize=0.8)
axis(1,at=seq(0,1,by=0.25))
title(xlab="relative time before present")
abline(v=seq(0,1,by=0.25),lty="dotted",col="grey")
```

Finally, let’s graph a comparison of the edge lengths with those obtained in our calibrated analysis.

```
plot(pl.tree$edge.length,uncalib_tree$edge.length,
pch=21,bg="grey",cex=1.2,bty="n",
xlab="fossil calibrated",
ylab="rate-smoothed without calibration points")
lines(c(0,max(nodeHeights(pl.tree))),
c(0,max(nodeHeights(uncalib_tree))))
legend("topleft","1:1 line",lty="solid",bty="n")
grid()
title(main=paste("Comparison of edge lengths from calibrated\n",
"and uncalibrated analyses"))
```

## No comments:

## Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.