I was recently asked by a colleague if I could help develop a method to test if the rate of evolution of a continuous character was the same or different in different trees.

This is closely related to the *censored* approach to test for changes in the
rate of evolution developed by Brian O'Meara & colleagues in
2006. (Note that this
should also be related to a method published by Dean Adams in
2013.) Here, though, rather than
supply a single tree & specifying rate regimes within that tree, we can supply multiple
trees (as an object of class `"multiPhylo"`

) and a list of corresponding vectors
of trait values for each tree.

First, here is our function:

```
ratebytree<-function(trees,x,...){
N<-length(trees)
## first, fit multi-rate model
lik.multi<-function(theta,trees,x){
n<-sapply(trees,Ntip)
N<-length(trees)
sig<-theta[1:N]
a<-theta[(N+1):(2*N)]
C<-lapply(trees,vcv)
V<-mapply("*",C,sig)
logL<-0
for(i in 1:N)
logL<-logL-t(x[[i]]-a[i])%*%solve(V[[i]])%*%(x[[i]]-
a[i])/2-n[i]*log(2*pi)/2-determinant(V[[i]])$modulus[1]/2
-logL
}
foo<-function(tree,x){
obj<-phyl.vcv(as.matrix(x),vcv(tree),1)
c(obj$R[1,1],obj$a[1,1])
}
obj<-mapply(foo,trees,x)
p<-as.vector(t(obj))
fit.multi<-optim(p,lik.multi,trees=trees,x=x,method="L-BFGS-B",
lower=c(rep(0,N),rep(-Inf,N)),upper=c(rep(Inf,N),rep(Inf,N)))
## now fit single-rate model
lik.onerate<-function(theta,trees,x){
n<-sapply(trees,Ntip)
N<-length(trees)
sig<-theta[1]
a<-theta[1:N+1]
C<-lapply(trees,vcv)
V<-lapply(C,"*",sig)
logL<-0
for(i in 1:N)
logL<-logL-t(x[[i]]-a[i])%*%solve(V[[i]])%*%(x[[i]]-
a[i])/2-n[i]*log(2*pi)/2-determinant(V[[i]])$modulus[1]/2
-logL
}
p<-c(mean(fit.multi$par[1:N]),fit.multi$par[(N+1):(2*N)])
fit.onerate<-optim(p,lik.onerate,trees=trees,x=x,method="L-BFGS-B",
lower=c(0,rep(-Inf,N)),upper=c(Inf,rep(Inf,N)))
## compare models:
LR<-2*(-fit.multi$value+fit.onerate$value)
P.chisq<-pchisq(LR,df=N-1,lower.tail=FALSE)
obj<-list(
multi.rate.model=list(sig2=fit.multi$par[1:N],
a=fit.multi$par[(N+1):(2*N)],
k=2*N,
logL=-fit.multi$value,
counts=fit.multi$counts,convergence=fit.multi$convergence,
message=fit.multi$message),
common.rate.model=list(sig2=fit.onerate$par[1],
a=fit.onerate$par[1:N+1],
k=N+1,
logL=-fit.onerate$value,
counts=fit.onerate$counts,convergence=fit.onerate$convergence,
message=fit.onerate$message),
N=N,likelihood.ratio=LR,P.chisq=P.chisq)
class(obj)<-"ratebytree"
obj
}
## S3 print method
print.ratebytree<-function(x,...){
if(hasArg(digits)) digits<-list(...)$digits
else digits<-4
N<-x$N
cat("ML common-rate model:\n")
cat(paste("\ts^2\t",paste(paste("a[",1:N,"]",sep=""),collapse="\t")),
"\tk\tlogL\n")
cat(paste("value",round(x$common.rate.model$sig2,digits),
paste(round(x$common.rate.model$a,digits),collapse="\t"),
x$common.rate.model$k,round(x$common.rate.model$logL,digits),
"\n\n",sep="\t"))
cat("ML multi-rate model:\n")
cat(paste("\t",paste(paste("s^2[",1:N,"]",sep=""),collapse="\t"),"\t",
paste(paste("a[",1:N,"]",sep=""),collapse="\t")),
"\tk\tlogL\n")
cat(paste("value",paste(round(x$multi.rate.model$sig2,digits),collapse="\t"),
paste(round(x$multi.rate.model$a,digits),collapse="\t"),
x$multi.rate.model$k,round(x$multi.rate.model$logL,digits),
"\n\n",sep="\t"))
cat(paste("Likelihood ratio:",round(x$likelihood.ratio,digits),"\n"))
cat(paste("P-value (based on X^2):",round(x$P.chisq,digits),"\n\n"))
if(x$multi.rate.model$convergence==0&&x$common.rate.model$convergence==0)
cat("R thinks it has found the ML solution.\n\n")
else cat("One or the other optimization may not have converged.\n\n")
}
```

Now let's try it with some simulated data.

First, here's a visualization of our data:

```
ylim<-range(c(x,y,z))
par(mfrow=c(1,3))
phenogram(t1,x,ylim=ylim,spread.cost=c(1,0))
phenogram(t2,y,ylim=ylim,spread.cost=c(1,0))
```

```
## Optimizing the positions of the tip labels...
```

```
phenogram(t3,z,ylim=ylim,spread.cost=c(1,0))
```

```
## Optimizing the positions of the tip labels...
```

It's hard to tell from this whether the rates of each clade are the same or different - especially because the trees all have different total depths.

Now we're ready to fit our models

```
library(phytools)
fit<-ratebytree(c(t1,t2,t3),list(x,y,z))
fit
```

```
## ML common-rate model:
## s^2 a[1] a[2] a[3] k logL
## value 1.4387 0.3648 1.5135 1.3195 4 -189.3164
##
## ML multi-rate model:
## s^2[1] s^2[2] s^2[3] a[1] a[2] a[3] k logL
## value 1.3265 1.9868 0.8394 0.3648 1.5135 1.3195 6 -184.4735
##
## Likelihood ratio: 9.6859
## P-value (based on X^2): 0.0079
##
## R thinks it has found the ML solution.
```

Note that tree #1 and tree #3 have very comparable estimated rates (actually tree #1 is a little higher), even though the range of values is substantially greater on tree #3. Neat.

For the record, these data were simulated as follows:

```
t1<-pbtree(n=26)
t2<-pbtree(n=60)
t3<-pbtree(n=50)
x<-fastBM(t1,sig2=1)
y<-fastBM(t2,sig2=2)
z<-fastBM(t3,sig2=1)
```

This function is now in phytools. I recommend install phytools from GitHub using devtools rather than using the version posted above.

ReplyDelete