Tuesday, October 28, 2014

Alternative implementations of fitContinuous

A colleague contacted me today about “alternative implementations” of geiger's fitContinuous function that he could use to check parameter estimation, and maybe to speed calculations as fitContinuous can be somewhat slow (although is evidently quite robust now) for some models.

Well, I don't know about speeding things up - but it is pretty straightforward to write a wrapper function that combines phytools brownie.lite single-rate model with geiger rescale.phylo to fit many of the models of fitContinuous. The following is a simple demo:

## load packages
library(phytools)
library(geiger)

## helper function to get the root node number
getRoot<-function(tree) Ntip(tree)+1

## here is our fitContinuous lite function
fitCont<-function(tree,x,model="BM",interval=NULL){
    if(model!="BM"){
        lk<-function(par,tree,x,model) brownie.lite(paintSubTree(rescale(tree,model,par),getRoot(tree),"1"),x)$logL1
        oFit<-optimize(lk,interval,tree=tree,x=x,model=model,maximum=TRUE)
        cFit<-brownie.lite(paintSubTree(rescale(tree,model,oFit$maximum),getRoot(tree),"1"),x)
        obj<-list(par=oFit$maximum,model=model,sig2=cFit$sig2.single,a=cFit$a.single,logLik=oFit$objective)
    } else {
        fit<-brownie.lite(paintSubTree(tree,getRoot(tree),"1"),x)
        obj<-list(model=model,sig2=fit$sig2.single,a=fit$a.single,logLik=fit$logL1)
    }
    obj
}

OK, now let's try it for simulated data:

## simulate tree
tree<-pbtree(n=100,scale=1)
## simulate data under model="BM"
x<-fastBM(tree)
fitContinuous(tree,x)
## GEIGER-fitted comparative model of continuous data
##  fitted 'BM' model parameters:
##  sigsq = 0.940924
##  z0 = -0.186971
## 
##  model summary:
##  log-likelihood = -58.714396
##  AIC = 121.428793
##  AICc = 121.552504
##  free parameters = 2
## 
## Convergence diagnostics:
##  optimization iterations = 100
##  failed iterations = 0
##  frequency of best fit = 1.00
## 
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates
fitCont(tree,x)
## $model
## [1] "BM"
## 
## $sig2
## [1] 0.9409
## 
## $a
## [1] -0.187
## 
## $logLik
## [1] -58.71
## simulate data under model="lambda"
x<-fastBM(rescale(tree,model="lambda",0.5))
fitContinuous(tree,x,model="lambda")
## GEIGER-fitted comparative model of continuous data
##  fitted 'lambda' model parameters:
##  lambda = 0.729756
##  sigsq = 1.217453
##  z0 = 0.254493
## 
##  model summary:
##  log-likelihood = -122.552971
##  AIC = 251.105941
##  AICc = 251.355941
##  free parameters = 3
## 
## Convergence diagnostics:
##  optimization iterations = 100
##  failed iterations = 0
##  frequency of best fit = 0.32
## 
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates
fitCont(tree,x,model="lambda",interval=c(0,1))
## $par
## [1] 0.7298
## 
## $model
## [1] "lambda"
## 
## $sig2
## [1] 1.217
## 
## $a
## [1] 0.2545
## 
## $logLik
## [1] -122.6
## simulate data under model="OU"
x<-fastBM(rescale(tree,model="OU",0.4))
fitContinuous(tree,x,model="OU")
## GEIGER-fitted comparative model of continuous data
##  fitted 'OU' model parameters:
##  alpha = 0.536268
##  sigsq = 1.171551
##  z0 = 0.216378
## 
##  model summary:
##  log-likelihood = -61.980067
##  AIC = 129.960133
##  AICc = 130.210133
##  free parameters = 3
## 
## Convergence diagnostics:
##  optimization iterations = 100
##  failed iterations = 0
##  frequency of best fit = 0.60
## 
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates
fitCont(tree,x,model="OU",interval=c(0,1))
## $par
## [1] 0.5363
## 
## $model
## [1] "OU"
## 
## $sig2
## [1] 1.172
## 
## $a
## [1] 0.2164
## 
## $logLik
## [1] -61.98

Running system.time with any of these examples (except for model="BM") does not show any improvement of this ostensibly lighter implementation - but this is probably due to lots of unnecessary internals in brownie.lite for fitting a multi-rate model (which is not actually done here). Improving on this is thus pretty simple (but I will not do so here, nor now).

That's it.

Monday, October 27, 2014

Optimizing tree transformations for multiple discrete characters

A recent R-sig-phylo subscriber asked:

“I have a tree and discrete data (number of gene copies, for many genes) and would like to use the fitDiscrete function in geiger, or something similar. However, I would like to estimate the parameters given all of the datasets, not just with the data for each gene. For instance, if I was using the "delta” model to vary rates across the tree, I would like this delta value to reflect some sort of summary value across all datasets. Does anyone have an idea as to how this could be accomplished or perhaps point me in the right direction?“

Brian O'Meara pointed out that it would be straighforward to write a likelihood function that just summed the log-likelihoods across characters for different tree transformations and then wrapped the function in an optimizer. The following illustrates this explicitly.

First load packages - for this we need ape, phytools (just for simulation here), and geiger (for our tree rescaling):

## load packages
library(ape)
library(phytools)
## Loading required package: maps
library(geiger)

Now let's simulate some data that would be analogous to what we'd load from file in the empirical case:

## simulate tree & data
Q<-matrix(c(-1,1,1,-1),2,2)
rownames(Q)<-colnames(Q)<-letters[1:2]
tree<-pbtree(n=100,scale=1)
X<-replicate(10,sim.history(rescale(tree,model="lambda",lambda=0.5),Q)$states)
## Done simulation(s).
## Done simulation(s).
## Done simulation(s).
## Done simulation(s).
## Done simulation(s).
## Done simulation(s).
## Done simulation(s).
## Done simulation(s).
## Done simulation(s).
## Done simulation(s).
## this is what our data look like:
X
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## t64  "a"  "a"  "a"  "a"  "b"  "a"  "b"  "a"  "a"  "a"  
## t97  "a"  "b"  "b"  "a"  "a"  "b"  "b"  "b"  "a"  "a"  
## t98  "b"  "a"  "a"  "b"  "b"  "b"  "b"  "b"  "b"  "a"  
## t43  "b"  "b"  "b"  "a"  "b"  "a"  "a"  "b"  "a"  "b"  
## t44  "b"  "a"  "a"  "b"  "b"  "b"  "a"  "a"  "a"  "b"  
## t23  "b"  "a"  "b"  "a"  "b"  "a"  "a"  "a"  "b"  "a"  
## t85  "b"  "a"  "b"  "b"  "a"  "a"  "b"  "b"  "b"  "b"  
## t86  "b"  "a"  "b"  "b"  "b"  "b"  "a"  "b"  "a"  "a"  
## t68  "a"  "a"  "b"  "a"  "a"  "b"  "a"  "b"  "a"  "a"  
## t83  "b"  "a"  "b"  "b"  "b"  "a"  "a"  "b"  "a"  "b"  
## t84  "a"  "a"  "b"  "a"  "b"  "a"  "b"  "a"  "b"  "b"  
## t2   "a"  "a"  "a"  "a"  "b"  "b"  "a"  "b"  "b"  "a"  
## t3   "b"  "a"  "a"  "a"  "b"  "a"  "a"  "a"  "a"  "a"  
## t91  "b"  "b"  "b"  "a"  "a"  "a"  "b"  "a"  "a"  "a"  
## t92  "a"  "a"  "a"  "a"  "a"  "a"  "b"  "a"  "a"  "a"  
## t45  "b"  "a"  "b"  "a"  "a"  "b"  "a"  "a"  "a"  "b"  
## t46  "b"  "a"  "b"  "b"  "b"  "a"  "a"  "b"  "a"  "a"  
## t58  "b"  "b"  "a"  "a"  "a"  "a"  "a"  "a"  "a"  "a"  
## t59  "a"  "a"  "b"  "b"  "b"  "b"  "a"  "b"  "b"  "b"  
## t21  "b"  "b"  "b"  "b"  "b"  "a"  "a"  "a"  "b"  "b"  
## t60  "b"  "a"  "a"  "a"  "b"  "b"  "b"  "a"  "b"  "b"  
## t89  "a"  "a"  "b"  "a"  "b"  "b"  "a"  "a"  "a"  "b"  
## t90  "b"  "b"  "a"  "b"  "a"  "a"  "a"  "a"  "b"  "b"  
## t36  "b"  "a"  "b"  "a"  "b"  "b"  "b"  "b"  "b"  "a"  
## t27  "b"  "b"  "b"  "b"  "b"  "a"  "b"  "a"  "a"  "b"  
## t28  "a"  "b"  "b"  "b"  "b"  "b"  "a"  "a"  "b"  "b"  
## t29  "b"  "b"  "b"  "a"  "a"  "b"  "a"  "a"  "b"  "b"  
## t37  "b"  "b"  "b"  "b"  "b"  "a"  "a"  "a"  "b"  "b"  
## t38  "b"  "a"  "a"  "a"  "a"  "b"  "a"  "a"  "a"  "b"  
## t16  "a"  "a"  "b"  "b"  "a"  "a"  "a"  "b"  "a"  "a"  
## t17  "a"  "a"  "a"  "b"  "a"  "b"  "b"  "a"  "a"  "b"  
## t15  "b"  "a"  "a"  "a"  "a"  "b"  "b"  "a"  "b"  "b"  
## t47  "b"  "a"  "b"  "a"  "b"  "b"  "b"  "a"  "a"  "b"  
## t61  "b"  "a"  "b"  "a"  "b"  "a"  "a"  "b"  "a"  "b"  
## t62  "b"  "a"  "b"  "b"  "a"  "a"  "a"  "a"  "a"  "b"  
## t10  "b"  "a"  "b"  "b"  "b"  "b"  "a"  "b"  "a"  "b"  
## t11  "b"  "a"  "a"  "b"  "b"  "a"  "b"  "b"  "a"  "b"  
## t4   "b"  "b"  "b"  "a"  "a"  "b"  "a"  "b"  "a"  "a"  
## t13  "b"  "a"  "a"  "a"  "b"  "a"  "a"  "a"  "b"  "b"  
## t31  "b"  "a"  "a"  "b"  "a"  "b"  "b"  "a"  "a"  "a"  
## t93  "b"  "a"  "a"  "b"  "b"  "b"  "b"  "a"  "b"  "b"  
## t94  "b"  "a"  "a"  "a"  "b"  "a"  "a"  "a"  "b"  "a"  
## t8   "b"  "a"  "a"  "b"  "b"  "b"  "a"  "a"  "a"  "b"  
## t5   "a"  "a"  "b"  "b"  "b"  "b"  "b"  "b"  "a"  "b"  
## t42  "b"  "b"  "b"  "b"  "b"  "a"  "a"  "a"  "a"  "a"  
## t76  "a"  "b"  "b"  "a"  "b"  "a"  "b"  "a"  "a"  "a"  
## t77  "a"  "b"  "b"  "b"  "a"  "b"  "b"  "b"  "a"  "b"  
## t63  "a"  "a"  "b"  "b"  "b"  "b"  "a"  "a"  "b"  "a"  
## t95  "b"  "b"  "b"  "b"  "a"  "b"  "a"  "a"  "a"  "a"  
## t96  "b"  "a"  "a"  "b"  "a"  "b"  "a"  "a"  "b"  "a"  
## t32  "b"  "b"  "b"  "a"  "a"  "b"  "a"  "b"  "b"  "a"  
## t33  "b"  "b"  "a"  "b"  "b"  "b"  "b"  "b"  "b"  "a"  
## t81  "b"  "a"  "a"  "b"  "b"  "b"  "a"  "a"  "a"  "b"  
## t82  "b"  "a"  "a"  "b"  "b"  "b"  "a"  "a"  "a"  "b"  
## t24  "b"  "b"  "a"  "a"  "b"  "b"  "a"  "b"  "a"  "b"  
## t14  "a"  "a"  "a"  "a"  "a"  "a"  "a"  "b"  "a"  "b"  
## t25  "a"  "a"  "b"  "a"  "a"  "a"  "b"  "a"  "b"  "b"  
## t78  "a"  "a"  "b"  "b"  "a"  "a"  "b"  "b"  "b"  "b"  
## t79  "a"  "b"  "b"  "a"  "a"  "a"  "b"  "a"  "b"  "b"  
## t19  "b"  "b"  "a"  "b"  "b"  "a"  "a"  "b"  "a"  "a"  
## t50  "a"  "b"  "a"  "a"  "b"  "b"  "a"  "a"  "a"  "b"  
## t51  "b"  "a"  "b"  "a"  "a"  "a"  "a"  "a"  "a"  "a"  
## t26  "a"  "b"  "a"  "b"  "b"  "b"  "a"  "a"  "a"  "a"  
## t66  "a"  "a"  "b"  "a"  "b"  "a"  "a"  "b"  "a"  "a"  
## t67  "a"  "a"  "b"  "a"  "a"  "a"  "b"  "a"  "a"  "b"  
## t54  "a"  "b"  "b"  "a"  "b"  "a"  "b"  "a"  "a"  "a"  
## t55  "b"  "a"  "a"  "a"  "a"  "a"  "a"  "a"  "a"  "b"  
## t52  "a"  "a"  "a"  "a"  "b"  "b"  "b"  "b"  "b"  "b"  
## t53  "a"  "a"  "a"  "a"  "a"  "b"  "a"  "b"  "b"  "b"  
## t7   "b"  "a"  "b"  "a"  "b"  "a"  "a"  "a"  "b"  "b"  
## t18  "b"  "a"  "a"  "b"  "a"  "b"  "a"  "a"  "b"  "b"  
## t20  "b"  "b"  "b"  "b"  "a"  "b"  "b"  "b"  "a"  "a"  
## t69  "b"  "a"  "a"  "b"  "a"  "b"  "a"  "a"  "b"  "b"  
## t87  "b"  "b"  "b"  "b"  "b"  "b"  "a"  "b"  "b"  "b"  
## t88  "b"  "b"  "b"  "b"  "b"  "a"  "b"  "b"  "b"  "b"  
## t65  "b"  "b"  "b"  "b"  "a"  "b"  "b"  "b"  "a"  "a"  
## t74  "b"  "b"  "b"  "a"  "b"  "a"  "a"  "b"  "b"  "a"  
## t75  "a"  "a"  "a"  "a"  "a"  "b"  "a"  "b"  "a"  "b"  
## t39  "b"  "a"  "a"  "a"  "b"  "b"  "b"  "a"  "a"  "b"  
## t22  "a"  "a"  "a"  "a"  "a"  "b"  "b"  "a"  "a"  "b"  
## t34  "b"  "b"  "b"  "a"  "b"  "b"  "a"  "b"  "a"  "a"  
## t35  "a"  "a"  "a"  "b"  "b"  "a"  "b"  "a"  "a"  "b"  
## t70  "b"  "a"  "a"  "a"  "b"  "b"  "b"  "b"  "b"  "a"  
## t71  "a"  "a"  "b"  "a"  "b"  "a"  "a"  "a"  "a"  "b"  
## t99  "a"  "b"  "b"  "a"  "b"  "a"  "a"  "b"  "b"  "b"  
## t100 "b"  "a"  "a"  "a"  "b"  "a"  "a"  "b"  "a"  "b"  
## t80  "b"  "a"  "a"  "a"  "b"  "a"  "a"  "b"  "a"  "b"  
## t6   "a"  "a"  "a"  "a"  "a"  "b"  "a"  "b"  "a"  "a"  
## t12  "b"  "a"  "a"  "a"  "a"  "b"  "a"  "a"  "b"  "b"  
## t30  "b"  "a"  "a"  "b"  "a"  "a"  "a"  "b"  "b"  "b"  
## t56  "b"  "b"  "a"  "a"  "b"  "b"  "a"  "a"  "b"  "b"  
## t57  "b"  "a"  "a"  "b"  "a"  "b"  "a"  "b"  "b"  "a"  
## t40  "a"  "b"  "b"  "a"  "a"  "b"  "b"  "b"  "a"  "b"  
## t41  "b"  "a"  "b"  "a"  "b"  "b"  "a"  "b"  "b"  "a"  
## t1   "a"  "b"  "b"  "a"  "b"  "a"  "a"  "b"  "b"  "b"  
## t9   "a"  "b"  "b"  "a"  "a"  "b"  "a"  "b"  "b"  "b"  
## t48  "b"  "a"  "a"  "a"  "b"  "b"  "a"  "b"  "b"  "b"  
## t49  "b"  "a"  "b"  "b"  "b"  "b"  "b"  "b"  "b"  "b"  
## t72  "b"  "a"  "a"  "a"  "a"  "b"  "a"  "b"  "b"  "b"  
## t73  "a"  "a"  "a"  "b"  "b"  "b"  "a"  "b"  "a"  "b"

Note that it is arbitrary, and unnecessary, that I have simulated using the same transition matrix Q for each data column here.

Finally, our tree-transformation fitting function. Here, I've used ace from ape internally to compute the likelihood of each character for each tree transformation. One could instead use fitDiscrete (which is slower, but perhaps more robust) or function in the phangorn package.

## here is our model fitting function
fitTransform<-function(tree,X,model,interval){
    lk<-function(par,tree,X,model){
        sum(apply(X,2,function(x,tree,model,par) 
            logLik(ace(x,rescale(tree,model,par),type="discrete")),
            tree=tree,model=model,par=par))
    }
    fit<-optimize(lk,interval,tree=tree,X=X,model=model,maximum=TRUE)
    return(list(par=fit$maximum,model=model,logLik=fit$objective))
}
obj<-fitTransform(tree,X,model="lambda",interval=c(0,1))
## here is our fitted model
obj
## $par
## [1] 0.472
## 
## $model
## [1] "lambda"
## 
## $logLik
## [1] -666.7

We could use the same function with different values of model and (necessarily, if so) interval. I use model="lambda" merely to illustrate the case.

That's it.

Saturday, October 25, 2014

Reversing the time axis in an ltt95 plot

A phytools user recently asked the following:

“I am currently using the ltt95 function on a "multiPhylo" object. I would like to known if there is a solution to reverse the time axis from the root of the tree to present?”

Indeed this is not too difficult. ltt95 already returns an object of class "ltt95" invisibly. This object is basically a matrix containing the plotted data for x and y in columns, with a few different attributes to tell the S3 plotting method for objects of its class how to handle the matrix. So if we were naive as to the functionality of our plotting method, we could just manipulate the contents of that matrix to get the plot that we wanted. phytools makes things even easier, however, as there is an argument xaxis in the plotting method for objects of class "ltt95" that automates this all for us.

Here's how we can use it to control the direction of x in various ways:

## load phytools
library(phytools)
## Loading required package: ape
## Loading required package: maps
## simulate some trees
trees<-pbtree(n=100,scale=50,nsim=100)
## now create a standard ltt95 plot
obj<-ltt95(trees,log=TRUE)

plot of chunk unnamed-chunk-1

## OK - now if we want to simply flip the values on x
## so that they should "time before present"
plot(obj,xaxis="flipped")

plot of chunk unnamed-chunk-1

## we can also plot time before present as "negative time"
plot(obj,xaxis="negative")

plot of chunk unnamed-chunk-1

Well, that's really it.

Saturday, October 11, 2014

Colors terminal edges in plotTree.wBars by a discrete trait

A phytools user recently wrote in to ask if she could color the terminal edges of a phylogeny plotted using plotTree.wBars different colors depending on a discrete trait. Well, we already know that it's possible to plot a tree with a mapped discrete character. So to accomplish this, we just need to paint the terminal edges of the tree using the phytools function paintSubTree.

Here's a demo using simulated data:

library(phytools)
## let's simulate a tree & some data
tree<-pbtree(n=100,scale=1)
x<-fastBM(tree)
x<-x-min(x) ## set min to zero
th<-setNames(c(max(x)/4*1:3,Inf),LETTERS[1:4])
y<-sapply(x,threshState,th)
y
##  t71  t72  t59   t2  t29  t99 t100  t70  t81  t82  t20  t21  t22  t23  t53 
##  "B"  "B"  "C"  "B"  "C"  "D"  "D"  "A"  "A"  "A"  "B"  "C"  "D"  "D"  "C" 
##  t54  t65  t66  t14  t15  t10   t6  t16  t79  t80  t40   t8   t9   t1  t17 
##  "C"  "B"  "A"  "A"  "B"  "B"  "B"  "C"  "C"  "C"  "B"  "B"  "B"  "B"  "C" 
##  t93  t94  t37  t63  t64  t32   t3  t48  t89  t90  t43  t44  t67  t68  t91 
##  "C"  "C"  "B"  "B"  "B"  "B"  "B"  "B"  "B"  "B"  "D"  "C"  "C"  "B"  "B" 
##  t92  t38  t11  t12  t13  t30  t31   t7  t57  t58  t45  t75  t78  t95  t96 
##  "B"  "B"  "B"  "C"  "C"  "B"  "A"  "C"  "B"  "B"  "B"  "B"  "B"  "B"  "B" 
##  t60  t61  t62  t33  t34  t73  t74  t76  t77  t51  t55  t56  t18  t49  t50 
##  "B"  "C"  "C"  "C"  "C"  "B"  "B"  "C"  "B"  "C"  "C"  "C"  "B"  "C"  "C" 
##  t35  t85  t86  t36  t46  t47  t19   t4  t27  t28  t52  t83  t84  t87  t88 
##  "B"  "B"  "B"  "B"  "A"  "A"  "A"  "A"  "B"  "B"  "B"  "B"  "B"  "B"  "B" 
##  t39  t25  t26  t24  t97  t98  t69  t41  t42   t5 
##  "B"  "B"  "A"  "A"  "B"  "B"  "B"  "C"  "D"  "C"

OK, so now we have data approximating the empirical data and we can attempt to generate our simulation. We will do this by first coloring all the terminal edges of the tree according to our discrete character, and then, having done that, we will run plotTree.wBars using the method="plotSimmap".

## first color the tip edges
for(i in 1:length(y)) tree<-paintSubTree(tree,node=i,state=y[i],stem=TRUE)
colors<-setNames(c("black","red","blue","green","orange"),c(1,LETTERS[1:4]))
plotSimmap(tree,type="fan",colors=colors)
## setEnv=TRUE for this type is experimental. please be patient with bugs
add.simmap.legend(colors=colors[2:5],prompt=FALSE,x=0.95*par()$usr[1],y=0.95*par()$usr[4])

plot of chunk unnamed-chunk-2

## now plot with bars
plotTree.wBars(tree,x,method="plotSimmap",colors=colors,type="fan",scale=0.1)
add.simmap.legend(colors=colors[2:5],prompt=FALSE,x=0.95*par()$usr[1],y=0.95*par()$usr[4])

plot of chunk unnamed-chunk-2

Obviously, the user will have to vary this to get the desired plot for their own data, but this is the general idea!