Monday, September 4, 2017

ratebytree for comparing rates & regimes among trees with different evolutionary models

Along with several Universidad de los Andes colleagues from my sabbatical there (such as the famous Andrew Crawford and his postdoc Carlos Guarnizo), I recently submitted for publication a description of the phytools function ratebytree for comparing the evolutionary rate between trees - along with some of its statistical properties.

This method basically implements the censored test of O'Meara et al. (2006), but applied to the problem of comparing the rate of evolution between trees - rather than among clades or splits in a given tree.

One of the editor comments was that it would be straightforward to allow the comparison of different models or processes of evolution on the different trees. Although I prefer the simplicity of interpretation of a comparison of rates (a difference in which may be due to many different processes), the editor is right in that it is also possible to fit different models of trait evolution to the different trees & then ask if permitting the parameters of these models to take different values significantly better explains our data (or not).

So far, I have implemented the models "OU" (Ornstein-Uhlenbeck) and "EB" (early-burst). The OU model in comparative analyses is most often associated with Hansen (1997), although I believe Felsenstein suggested it earlier; and the EB model is linked to Blomberg et al. (2003).

Here's how it works, first with the "EB" model:

library(phytools)
packageVersion("phytools")
## [1] '0.6.24'
t1
## 
## Phylogenetic tree with 26 tips and 25 internal nodes.
## 
## Tip labels:
##  A, B, C, D, E, F, ...
## 
## Rooted; includes branch lengths.
t2
## 
## Phylogenetic tree with 40 tips and 39 internal nodes.
## 
## Tip labels:
##  t1, t4, t23, t24, t18, t16, ...
## 
## Rooted; includes branch lengths.
x1
##          A          B          C          D          E          F 
## -0.3441420 -0.3533002 -0.3115814 -0.3160447 -0.4460135 -0.4550335 
##          G          H          I          J          K          L 
## -0.4139192 -0.3953837 -0.4060349 -0.4077641 -0.8347388 -0.7452221 
##          M          N          O          P          Q          R 
## -0.7185576 -0.7635412 -0.7700484 -0.7695936 -0.7634808 -0.6948223 
##          S          T          U          V          W          X 
## -0.7412721 -0.8551189 -0.7677177 -0.7679566 -0.8786428 -0.9301932 
##          Y          Z 
## -0.9146077 -0.8141308
x2
##         t1         t4        t23        t24        t18        t16 
##  0.8566059  3.7251483  2.6191894  1.8225781  1.8057385  3.4079308 
##        t29        t30        t11        t12         t5        t10 
##  4.5958938  4.1774617  4.2133663  3.7181830  3.6334797  4.7123706 
##        t25        t26         t7        t17        t31        t32 
##  5.2011408  5.3905739  4.1595605  2.0951849  2.9624926  3.4058012 
##        t35        t36        t19        t15         t6        t27 
##  4.2541885  4.1980012  3.8449273  4.3596184  4.4675526  3.1422765 
##        t28        t20        t39        t40         t8        t13 
##  3.8031220  2.5404983  4.9569045  4.8507862  4.3520308  4.5914069 
##        t14         t2        t33        t34        t37        t38 
##  5.2167261  1.8276020  1.0305605  0.2938116  0.7720743  0.9339237 
##         t3        t21        t22         t9 
## -0.5178370  1.1703200  0.9172244  0.6591119
fitEB<-ratebytree(c(t1,t2),list(x1,x2),model="EB")
fitEB
## ML common-regime EB model:
##  s^2  a[1]   a[2]    r   k   logL
## value    12.0073 -0.7814 1.5506  -1.2953 4   -57.2681    
## 
## ML multi-regime EB model:
##   s^2[1] s^2[2]   a[1]   a[2]     r[1]   r[2]    k   logL
## value    0.3448  3.4066  -0.7853 1.6983  -1.5952 -0.6047 6   -8.2072 
## 
## Likelihood ratio: 98.1218 
## P-value (based on X^2): 0 
## 
## R thinks it has found the ML solution.

This tells us that under the "EB" model if we allow each tree to have different parameter values the fit is significantly better than if they are both constrained to have the same parameter values.

Next, we can try the "OU" model:

x3
##          A          B          C          D          E          F 
##  2.0994227  2.0572285  3.0075786  2.8504237  3.6951966  3.6323817 
##          G          H          I          J          K          L 
##  2.4724997  1.6972401  1.4710967  1.3452084  1.0863732  2.3565924 
##          M          N          O          P          Q          R 
##  3.4456653 -0.4734070  0.8027163  0.5108661  1.4534683  0.1115656 
##          S          T          U          V          W          X 
## -1.4205819  2.1651582  2.8256807  2.9683889  2.8473970 -1.1233798 
##          Y          Z 
## -1.5167766 -3.1592647
x4
##            t1            t4           t23           t24           t18 
## -0.4781630036 -0.4130760100 -0.7107502478  0.4403283556 -0.9813284805 
##           t16           t29           t30           t11           t12 
##  0.1100536930  0.1748046581 -0.0005799426 -0.4965174644  0.6461984642 
##            t5           t10           t25           t26            t7 
## -0.2498849416 -0.3849844114  0.1414963280 -0.2731847500 -0.3726606412 
##           t17           t31           t32           t35           t36 
##  0.1467272989 -0.4809644865 -0.0103007857  0.6234118866 -0.4649965648 
##           t19           t15            t6           t27           t28 
## -0.5483876423 -0.5286145444  0.6421465052  1.1954369835  0.4204277069 
##           t20           t39           t40            t8           t13 
##  1.2252202112 -0.2934648298 -0.4060123937  0.1507545508  0.5722774758 
##           t14            t2           t33           t34           t37 
##  0.3287662081  0.1404059566 -0.6361731193 -0.8295810175  0.6294745098 
##           t38            t3           t21           t22            t9 
##  0.7020644857 -0.6360564202  1.2895073650 -0.0730771239 -0.1221469493
fitOU<-ratebytree(c(t1,t2),list(x1,x2),model="OU")
fitOU
## ML common-regime OU model:
##  s^2  a[1]   a[2]    alpha   k   logL
## value    0.4868  -0.7499 1.9136  0   4   -65.6359    
## 
## ML multi-regime OU model:
##   s^2[1] s^2[2]   a[1]   a[2]     alp[1] alp[2]  k   logL
## value    0.0052  0.796   -0.7499 1.9136  -0.306  -0.1318 6   -16.5559    
## 
## Likelihood ratio: 98.1601 
## P-value (based on X^2): 0 
## 
## R thinks it has found the ML solution.

Once again, our fit allowing different regime parameterization in the two trees is significantly better than when they are forced to be the same.

(Note that here t1 has a negative value of α. Normally we wouldn't allow that, so I will have to fix that later.)

Finally, another example with "OU":

fitOU2<-ratebytree(c(t1,t2),list(x5,x6),model="OU")
fitOU2
## ML common-regime OU model:
##  s^2  a[1]   a[2]    alpha   k   logL
## value    1.0712  0.1545  0.092   1.4324  4   -54.3758    
## 
## ML multi-regime OU model:
##   s^2[1] s^2[2]   a[1]   a[2]     alp[1] alp[2]  k   logL
## value    1.658   0.7792  0.1563  0.0949  1.8076  1.2412  6   -53.373 
## 
## Likelihood ratio: 2.0056 
## P-value (based on X^2): 0.3669 
## 
## R thinks it has found the ML solution.

In this case, we cannot reject the possibility that both traits are evolving under the same OU process.

FYI, the trees & data for this example were simulated as follows:

t1<-pbtree(n=26,tip.label=LETTERS)
t2<-pbtree(n=40)
x1<-fastBM(phytools:::ebTree(t1,-2))
x2<-fastBM(t2)
x3<-fastBM(t1)
x4<-fastBM(t2,model="OU",alpha=2)
x5<-fastBM(t1,model="OU",alpha=1)
x6<-fastBM(t2,model="OU",alpha=1)

As an addendum, I thought I'd note the following things.

Firstly, the function can also be used in basically the same way as fitContinuous if we supply just a single tree & trait vector instead of a list of each. For instance:

obj<-list(t1)
class(obj)<-"multiPhylo"
fitBM1<-ratebytree(obj,list(x3))
fitContinuous(t1,x3)
## GEIGER-fitted comparative model of continuous data
##  fitted 'BM' model parameters:
##  sigsq = 1.059273
##  z0 = 0.540955
## 
##  model summary:
##  log-likelihood = -37.111074
##  AIC = 78.222148
##  AICc = 78.743887
##  free parameters = 2
## 
## Convergence diagnostics:
##  optimization iterations = 100
##  failed iterations = 0
##  frequency of best fit = 1.00
## 
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates
fitEB1<-ratebytree(obj,list(x3),model="EB")
fitContinuous(t1,x3,model="EB")
## GEIGER-fitted comparative model of continuous data
##  fitted 'EB' model parameters:
##  a = -0.378183
##  sigsq = 3.254252
##  z0 = 0.403735
## 
##  model summary:
##  log-likelihood = -36.750942
##  AIC = 79.501884
##  AICc = 80.592793
##  free parameters = 3
## 
## Convergence diagnostics:
##  optimization iterations = 100
##  failed iterations = 0
##  frequency of best fit = 0.77
## 
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates

and so on.

Secondly, the fitted multi-regime models should match the independent fits from fitContinuous in the geiger package - and the likelihood should be sum of the log-likelihoods from fitContinuous.

For instance:

fitBM2<-ratebytree(c(t2,t1),list(x2,x3))
fitC.bm1<-fitContinuous(t2,x2)
fitC.bm2<-fitContinuous(t1,x3)
fitC.bm1
## GEIGER-fitted comparative model of continuous data
##  fitted 'BM' model parameters:
##  sigsq = 0.799912
##  z0 = 1.913625
## 
##  model summary:
##  log-likelihood = -48.562175
##  AIC = 101.124350
##  AICc = 101.448675
##  free parameters = 2
## 
## Convergence diagnostics:
##  optimization iterations = 100
##  failed iterations = 0
##  frequency of best fit = 1.00
## 
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates
fitC.bm2
## GEIGER-fitted comparative model of continuous data
##  fitted 'BM' model parameters:
##  sigsq = 1.059273
##  z0 = 0.540955
## 
##  model summary:
##  log-likelihood = -37.111074
##  AIC = 78.222148
##  AICc = 78.743887
##  free parameters = 2
## 
## Convergence diagnostics:
##  optimization iterations = 100
##  failed iterations = 0
##  frequency of best fit = 1.00
## 
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates
logLik(fitC.bm1)+logLik(fitC.bm2)
## [1] -85.67325
fitBM2
## ML common-rate model:
##  s^2  a[1]   a[2]    k   logL
## value    0.9021  1.9136  0.541   3   -85.9892    
## 
## ML multi-rate model:
##   s^2[1] s^2[2]   a[1]   a[2]    k   logL
## value    0.7999  1.0593  1.9136  0.541   4   -85.6732    
## 
## Likelihood ratio: 0.6319 
## P-value (based on X^2): 0.4267 
## 
## R thinks it has found the ML solution.

ratebytree and fitContinuous are completely independent implementations of continuous character model-fitting in R, so the latter could provide a handy way to check the former, and vice versa.

That's all for now.

No comments:

Post a Comment