In this short tutorial I’ll demonstrate how to do penalized-likelihood rate-smoothing to obtain a time-calibrated ultrametric tree in R using the ape function chronos
.
This is only one way to obtain an ultrametric tree – we could also estimate an ultrametric, time-calibrated tree under a strict molecular clock or using a relaxed clock inference method, for instance in software like the program BEAST. The method of this tutorial, however, is very useful when we have an estimated tree but no nucleotide sequence alignment.
First load packages.
library(phytools)
library(phangorn)
Next, let’s use the data object packaged with phangorn containing a nucleotide sequence alignment for mammals, Laurasiatherian
.
data("Laurasiatherian")
Laurasiatherian
## 47 sequences with 3179 character and 1605 different site patterns.
## The states are a c g t
Let’s estimate a tree from these data using likelihood.
To do that I’ll get a starting tree with MP, assign random starting edge lengths, and then optimize my ML tree with optim.pml
.
mp.tree<-pratchet(Laurasiatherian)
mp.tree$edge.length<-runif(n=nrow(mp.tree$edge))
Next, we can create our "pml"
object and optimize it.
lik.model<-pml(mp.tree,Laurasiatherian,k=4)
ml.fit<-optim.pml(lik.model,optGamma=TRUE,optBf=TRUE,
optQ=TRUE,rearrangement="ratchet")
ml.fit
## model: F81+G(4)
## loglikelihood: -44650.31
## unconstrained loglikelihood: -17300.92
## Discrete gamma model
## Number of rate categories: 4
## Shape parameter: 0.3424146
##
## Rate matrix:
## a c g t
## a 0.000000 3.2308279 11.7912017 2.435826
## c 3.230828 0.0000000 0.6031533 23.128196
## g 11.791202 0.6031533 0.0000000 1.000000
## t 2.435826 23.1281956 1.0000000 0.000000
##
## Base frequencies:
## a c g t
## 0.3818781 0.1693982 0.1709454 0.2777784
I’ll now root the tree, using Platypus as my outgroup, and then prune the outgroup.
ml.tree<-root(ml.fit$tree,outgroup="Platypus")
ml.tree<-drop.tip(ml.tree,"Platypus")
plotTree(ml.tree,fsize=0.8)
We can see that this is a ML tree, but it is not time-calibrated and it isn’t ultrametric.
To obtain a time-calibrated ultrametric tree we’ll use the function chronos
in ape. chronos
implements the penalized likelihood method of Sanderson (2002).
To use it, we’ll first create a calibration data frame containing the nodes and their upper and lower temporal bounds using makeChronosCalib
and then we’ll pass the tree and this matrix to chronos
to do penalized likelihood estimation. The default model of this method is a correlated-rates model, but it also has an uncorrelated model in which we need to specify the number of rate categories.
I will use four different calibration nodes as follows:
nodes<-c(findMRCA(ml.tree,c("Possum","Cat")),
findMRCA(ml.tree,c("Squirrel","Mouse")),
findMRCA(ml.tree,c("Pig","BlueWhale")),
findMRCA(ml.tree,c("Human","Baboon")),
findMRCA(ml.tree,c("Horse","Donkey")))
age.min=c(159,66,59,27.95,6.2)
age.max=c(166,75,66,31.35,10)
plotTree(ladderize(ml.tree),fsize=0.8)
obj<-get("last_plot.phylo",envir=.PlotPhyloEnv)
points(obj$xx[nodes],obj$yy[nodes],pch=21,
bg=palette()[1:5],cex=2)
legend("bottomleft",paste("(",age.max,", ",
age.min,") mya",sep=""),pch=21,
pt.bg=palette()[1:5],pt.cex=2,bty="n")
The specific time intervals for each of these calibrations comes from http://www.timetree.org/.
Now, we use makeChronosCalib
to create a data frame containing our calibration points. If we want to use fixed calibrations, rather than intervals, we just need to set age.min
and age.max
to be equal!
calibration<-makeChronosCalib(ml.tree,node=nodes,
age.min=age.min,age.max=age.max)
calibration
## node age.min age.max soft.bounds
## 1 47 159.00 166.00 FALSE
## 2 49 66.00 75.00 FALSE
## 3 75 59.00 66.00 FALSE
## 4 53 27.95 31.35 FALSE
## 5 83 6.20 10.00 FALSE
With our calibration table in hand, we can go ahead and fit our model using penalized-likelihood as follows.
pl.tree<-chronos(ml.tree,calibration=calibration)
##
## Setting initial dates...
## Fitting in progress... get a first set of estimates
## (Penalised) log-lik = -17.07964
## Optimising rates... dates... -17.07964
## Warning: false convergence (8)
##
## log-Lik = -17.07905
## PHIIC = 304.16
pl.tree
##
## Chronogram
##
## Call: chronos(phy = ml.tree, calibration = calibration)
##
##
## Phylogenetic tree with 46 tips and 45 internal nodes.
##
## Tip labels:
## Wallaroo, Possum, Bandicoot, Opposum, Rabbit, Pika, ...
## Node labels:
## 1, 1, 1, 0.415, 0.792, 1, ...
##
## Rooted; includes branch lengths.
Let’s graph our tree.
plotTree(pl.tree,direction="leftwards",
xlim=c(174,-40),ftype="i",mar=c(4.1,1.1,0.1,1.1),
fsize=0.8)
axis(1)
title(xlab="millions of years before present")
abline(v=seq(0,150,by=50),lty="dotted",col="grey")
One nuance that I’ve ignored here is the effect of the smoothing parameter, \(\lambda\). We left it at it’s default value of lambda=1
.
Let’s try a different value and see the effect.
lambda.tree<-chronos(ml.tree,calibration=calibration,
lambda=0.1)
##
## Setting initial dates...
## Fitting in progress... get a first set of estimates
## (Penalised) log-lik = -17.07911
## Optimising rates... dates... -17.07911
##
## log-Lik = -17.07905
## PHIIC = 304.16
lambda.tree
##
## Chronogram
##
## Call: chronos(phy = ml.tree, lambda = 0.1, calibration = calibration)
##
##
## Phylogenetic tree with 46 tips and 45 internal nodes.
##
## Tip labels:
## Wallaroo, Possum, Bandicoot, Opposum, Rabbit, Pika, ...
## Node labels:
## 1, 1, 1, 0.415, 0.792, 1, ...
##
## Rooted; includes branch lengths.
Rather than graph our tree again, let’s just plot the edge lengths of our two trees, one against the other.
plot(pl.tree$edge.length,lambda.tree$edge.length,
pch=21,bg="grey",cex=1.2,bty="n",
xlab=expression(paste(lambda,"= 1")),
ylab=expression(paste(lambda,"= 0.1")))
lines(c(0,80),c(0,80))
legend("topleft","1:1 line",lty="solid",bty="n")
grid()
title(main=expression(paste(
"Comparison of edge lengths with two different ",
lambda," values")))
When we do that, we can see that (at least in this case) the effect of \(\lambda\) is not large. This is probably because we had a variety of calibration points distributed across the nodes of our phylogeny.
How do the divergence times for nodes that we did not use as calibration points compare to estimated divergence times in http://www.timetree.org/?
To check, let’s compute a matrix that gives the divergence depth of the MRCA of each pair of tips on the phylogeny using the ape function cophenetic
.
d<-cophenetic(pl.tree)/2
(We divide by two because cophenetic
gives us the total distance between tips.)
Let’s look at just part of this matrix:
tips<-c("Elephant","LongTBat","Sheep","Cow",
"Squirrel","Dormouse")
d[tips,tips]
## Elephant LongTBat Sheep Cow Squirrel Dormouse
## Elephant 0.00 107.02 107.02 107.02 116.49 116.49
## LongTBat 107.02 0.00 88.07 88.07 116.49 116.49
## Sheep 107.02 88.07 0.00 19.89 116.49 116.49
## Cow 107.02 88.07 19.89 0.00 116.49 116.49
## Squirrel 116.49 116.49 116.49 116.49 0.00 23.75
## Dormouse 116.49 116.49 116.49 116.49 23.75 0.00
You can do this for any pair of tips and see how it compares to the ages in http://www.timetree.org/.
Lastly, chronos
can also do penalized likelihood rate-smoothing without any calibration points at all. Let’s try that for comparison. (Here, I’ll use model="relaxed"
instead of model="correlated"
, the default. Full disclosure: this is because in this instance so doing yielded a higher correlation of the divergence times with those obtained using our calibration points!)
uncalib_tree<-chronos(ml.tree,lambda=1,model="relaxed")
##
## Setting initial dates...
## Fitting in progress... get a first set of estimates
## (Penalised) log-lik = -17.304
## Optimising rates... dates... -17.304
## Optimising rates... dates... -17.30103
##
## log-Lik = -17.22596
## PHIIC = 302.47
uncalib_tree
##
## Chronogram
##
## Call: chronos(phy = ml.tree, lambda = 1, model = "relaxed")
##
##
## Phylogenetic tree with 46 tips and 45 internal nodes.
##
## Tip labels:
## Wallaroo, Possum, Bandicoot, Opposum, Rabbit, Pika, ...
## Node labels:
## 1, 1, 1, 0.415, 0.792, 1, ...
##
## Rooted; includes branch lengths.
plotTree(uncalib_tree,direction="leftwards",
xlim=c(1,-0.2),ftype="i",mar=c(4.1,1.1,0.1,1.1),
fsize=0.8)
axis(1,at=seq(0,1,by=0.25))
title(xlab="relative time before present")
abline(v=seq(0,1,by=0.25),lty="dotted",col="grey")
Finally, let’s graph a comparison of the edge lengths with those obtained in our calibrated analysis.
plot(pl.tree$edge.length,uncalib_tree$edge.length,
pch=21,bg="grey",cex=1.2,bty="n",
xlab="fossil calibrated",
ylab="rate-smoothed without calibration points")
lines(c(0,max(nodeHeights(pl.tree))),
c(0,max(nodeHeights(uncalib_tree))))
legend("topleft","1:1 line",lty="solid",bty="n")
grid()
title(main=paste("Comparison of edge lengths from calibrated\n",
"and uncalibrated analyses"))
No comments:
Post a Comment
Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.