Along with several Universidad de los Andes colleagues from my sabbatical there (such as the famous
Andrew
Crawford and his postdoc Carlos Guarnizo), I
recently submitted for publication a description of the phytools function ratebytree
for
comparing the evolutionary rate between trees - along with some of its statistical properties.
This method basically implements the censored test of O'Meara et al. (2006), but applied to the problem of comparing the rate of evolution between trees - rather than among clades or splits in a given tree.
One of the editor comments was that it would be straightforward to allow the comparison of different models or processes of evolution on the different trees. Although I prefer the simplicity of interpretation of a comparison of rates (a difference in which may be due to many different processes), the editor is right in that it is also possible to fit different models of trait evolution to the different trees & then ask if permitting the parameters of these models to take different values significantly better explains our data (or not).
So far, I have
implemented
the models "OU"
(Ornstein-Uhlenbeck) and "EB"
(early-burst). The OU model in comparative analyses is most often associated with Hansen
(1997), although I believe Felsenstein suggested it
earlier; and the EB model is linked to Blomberg et al.
(2003).
Here's how it works, first with the "EB"
model:
library(phytools)
packageVersion("phytools")
## [1] '0.6.24'
t1
##
## Phylogenetic tree with 26 tips and 25 internal nodes.
##
## Tip labels:
## A, B, C, D, E, F, ...
##
## Rooted; includes branch lengths.
t2
##
## Phylogenetic tree with 40 tips and 39 internal nodes.
##
## Tip labels:
## t1, t4, t23, t24, t18, t16, ...
##
## Rooted; includes branch lengths.
x1
## A B C D E F
## -0.3441420 -0.3533002 -0.3115814 -0.3160447 -0.4460135 -0.4550335
## G H I J K L
## -0.4139192 -0.3953837 -0.4060349 -0.4077641 -0.8347388 -0.7452221
## M N O P Q R
## -0.7185576 -0.7635412 -0.7700484 -0.7695936 -0.7634808 -0.6948223
## S T U V W X
## -0.7412721 -0.8551189 -0.7677177 -0.7679566 -0.8786428 -0.9301932
## Y Z
## -0.9146077 -0.8141308
x2
## t1 t4 t23 t24 t18 t16
## 0.8566059 3.7251483 2.6191894 1.8225781 1.8057385 3.4079308
## t29 t30 t11 t12 t5 t10
## 4.5958938 4.1774617 4.2133663 3.7181830 3.6334797 4.7123706
## t25 t26 t7 t17 t31 t32
## 5.2011408 5.3905739 4.1595605 2.0951849 2.9624926 3.4058012
## t35 t36 t19 t15 t6 t27
## 4.2541885 4.1980012 3.8449273 4.3596184 4.4675526 3.1422765
## t28 t20 t39 t40 t8 t13
## 3.8031220 2.5404983 4.9569045 4.8507862 4.3520308 4.5914069
## t14 t2 t33 t34 t37 t38
## 5.2167261 1.8276020 1.0305605 0.2938116 0.7720743 0.9339237
## t3 t21 t22 t9
## -0.5178370 1.1703200 0.9172244 0.6591119
fitEB<-ratebytree(c(t1,t2),list(x1,x2),model="EB")
fitEB
## ML common-regime EB model:
## s^2 a[1] a[2] r k logL
## value 12.0073 -0.7814 1.5506 -1.2953 4 -57.2681
##
## ML multi-regime EB model:
## s^2[1] s^2[2] a[1] a[2] r[1] r[2] k logL
## value 0.3448 3.4066 -0.7853 1.6983 -1.5952 -0.6047 6 -8.2072
##
## Likelihood ratio: 98.1218
## P-value (based on X^2): 0
##
## R thinks it has found the ML solution.
This tells us that under the "EB"
model if we allow each tree to have different parameter
values the fit is significantly better than if they are both constrained to have the same parameter
values.
Next, we can try the "OU"
model:
x3
## A B C D E F
## 2.0994227 2.0572285 3.0075786 2.8504237 3.6951966 3.6323817
## G H I J K L
## 2.4724997 1.6972401 1.4710967 1.3452084 1.0863732 2.3565924
## M N O P Q R
## 3.4456653 -0.4734070 0.8027163 0.5108661 1.4534683 0.1115656
## S T U V W X
## -1.4205819 2.1651582 2.8256807 2.9683889 2.8473970 -1.1233798
## Y Z
## -1.5167766 -3.1592647
x4
## t1 t4 t23 t24 t18
## -0.4781630036 -0.4130760100 -0.7107502478 0.4403283556 -0.9813284805
## t16 t29 t30 t11 t12
## 0.1100536930 0.1748046581 -0.0005799426 -0.4965174644 0.6461984642
## t5 t10 t25 t26 t7
## -0.2498849416 -0.3849844114 0.1414963280 -0.2731847500 -0.3726606412
## t17 t31 t32 t35 t36
## 0.1467272989 -0.4809644865 -0.0103007857 0.6234118866 -0.4649965648
## t19 t15 t6 t27 t28
## -0.5483876423 -0.5286145444 0.6421465052 1.1954369835 0.4204277069
## t20 t39 t40 t8 t13
## 1.2252202112 -0.2934648298 -0.4060123937 0.1507545508 0.5722774758
## t14 t2 t33 t34 t37
## 0.3287662081 0.1404059566 -0.6361731193 -0.8295810175 0.6294745098
## t38 t3 t21 t22 t9
## 0.7020644857 -0.6360564202 1.2895073650 -0.0730771239 -0.1221469493
fitOU<-ratebytree(c(t1,t2),list(x1,x2),model="OU")
fitOU
## ML common-regime OU model:
## s^2 a[1] a[2] alpha k logL
## value 0.4868 -0.7499 1.9136 0 4 -65.6359
##
## ML multi-regime OU model:
## s^2[1] s^2[2] a[1] a[2] alp[1] alp[2] k logL
## value 0.0052 0.796 -0.7499 1.9136 -0.306 -0.1318 6 -16.5559
##
## Likelihood ratio: 98.1601
## P-value (based on X^2): 0
##
## R thinks it has found the ML solution.
Once again, our fit allowing different regime parameterization in the two trees is significantly better than when they are forced to be the same.
(Note that here t1
has a negative value of α. Normally we wouldn't allow that, so I
will have to fix that later.)
Finally, another example with "OU"
:
fitOU2<-ratebytree(c(t1,t2),list(x5,x6),model="OU")
fitOU2
## ML common-regime OU model:
## s^2 a[1] a[2] alpha k logL
## value 1.0712 0.1545 0.092 1.4324 4 -54.3758
##
## ML multi-regime OU model:
## s^2[1] s^2[2] a[1] a[2] alp[1] alp[2] k logL
## value 1.658 0.7792 0.1563 0.0949 1.8076 1.2412 6 -53.373
##
## Likelihood ratio: 2.0056
## P-value (based on X^2): 0.3669
##
## R thinks it has found the ML solution.
In this case, we cannot reject the possibility that both traits are evolving under the same OU process.
FYI, the trees & data for this example were simulated as follows:
t1<-pbtree(n=26,tip.label=LETTERS)
t2<-pbtree(n=40)
x1<-fastBM(phytools:::ebTree(t1,-2))
x2<-fastBM(t2)
x3<-fastBM(t1)
x4<-fastBM(t2,model="OU",alpha=2)
x5<-fastBM(t1,model="OU",alpha=1)
x6<-fastBM(t2,model="OU",alpha=1)
As an addendum, I thought I'd note the following things.
Firstly, the function can also be used in basically the same way as fitContinuous
if we supply
just a single tree & trait vector instead of a list of each. For instance:
obj<-list(t1)
class(obj)<-"multiPhylo"
fitBM1<-ratebytree(obj,list(x3))
fitContinuous(t1,x3)
## GEIGER-fitted comparative model of continuous data
## fitted 'BM' model parameters:
## sigsq = 1.059273
## z0 = 0.540955
##
## model summary:
## log-likelihood = -37.111074
## AIC = 78.222148
## AICc = 78.743887
## free parameters = 2
##
## Convergence diagnostics:
## optimization iterations = 100
## failed iterations = 0
## frequency of best fit = 1.00
##
## object summary:
## 'lik' -- likelihood function
## 'bnd' -- bounds for likelihood search
## 'res' -- optimization iteration summary
## 'opt' -- maximum likelihood parameter estimates
fitEB1<-ratebytree(obj,list(x3),model="EB")
fitContinuous(t1,x3,model="EB")
## GEIGER-fitted comparative model of continuous data
## fitted 'EB' model parameters:
## a = -0.378183
## sigsq = 3.254252
## z0 = 0.403735
##
## model summary:
## log-likelihood = -36.750942
## AIC = 79.501884
## AICc = 80.592793
## free parameters = 3
##
## Convergence diagnostics:
## optimization iterations = 100
## failed iterations = 0
## frequency of best fit = 0.77
##
## object summary:
## 'lik' -- likelihood function
## 'bnd' -- bounds for likelihood search
## 'res' -- optimization iteration summary
## 'opt' -- maximum likelihood parameter estimates
and so on.
Secondly, the fitted multi-regime models should match the independent fits from fitContinuous
in the geiger package - and the likelihood should be sum of the log-likelihoods from fitContinuous
.
For instance:
fitBM2<-ratebytree(c(t2,t1),list(x2,x3))
fitC.bm1<-fitContinuous(t2,x2)
fitC.bm2<-fitContinuous(t1,x3)
fitC.bm1
## GEIGER-fitted comparative model of continuous data
## fitted 'BM' model parameters:
## sigsq = 0.799912
## z0 = 1.913625
##
## model summary:
## log-likelihood = -48.562175
## AIC = 101.124350
## AICc = 101.448675
## free parameters = 2
##
## Convergence diagnostics:
## optimization iterations = 100
## failed iterations = 0
## frequency of best fit = 1.00
##
## object summary:
## 'lik' -- likelihood function
## 'bnd' -- bounds for likelihood search
## 'res' -- optimization iteration summary
## 'opt' -- maximum likelihood parameter estimates
fitC.bm2
## GEIGER-fitted comparative model of continuous data
## fitted 'BM' model parameters:
## sigsq = 1.059273
## z0 = 0.540955
##
## model summary:
## log-likelihood = -37.111074
## AIC = 78.222148
## AICc = 78.743887
## free parameters = 2
##
## Convergence diagnostics:
## optimization iterations = 100
## failed iterations = 0
## frequency of best fit = 1.00
##
## object summary:
## 'lik' -- likelihood function
## 'bnd' -- bounds for likelihood search
## 'res' -- optimization iteration summary
## 'opt' -- maximum likelihood parameter estimates
logLik(fitC.bm1)+logLik(fitC.bm2)
## [1] -85.67325
fitBM2
## ML common-rate model:
## s^2 a[1] a[2] k logL
## value 0.9021 1.9136 0.541 3 -85.9892
##
## ML multi-rate model:
## s^2[1] s^2[2] a[1] a[2] k logL
## value 0.7999 1.0593 1.9136 0.541 4 -85.6732
##
## Likelihood ratio: 0.6319
## P-value (based on X^2): 0.4267
##
## R thinks it has found the ML solution.
ratebytree
and fitContinuous
are completely independent implementations of
continuous character model-fitting in R, so the latter could provide a handy way to check the former,
and vice versa.
That's all for now.
No comments:
Post a Comment
Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.