Thursday, September 28, 2017

Transmitting Science course with Luke Harmon in 2018

Luke Harmon & I just agreed to give a course on phylogenetic comparative methods with the Spanish organization Transmitting Science in October of 2018. That's super far in the future, as far as I'm concerned, but they are already scheduling workshops for that time. The workshop will either be in the catalan village of Els Hostalets de Pierola or in Barcelona itself. Transmitting Science is a small organization based in Barcelona that puts on a variety of workshops in English & Spanish, the majority focused on themes related to evolution, paleontology, morphometrics, and biostatistics.

Soledad asked me to create a small image to represent the course and the following is what I came up with:

set.seed(100)
library(phytools)
library(RColorBrewer)
tree<-pbtree(n=200,scale=1)
par(bg="lightgrey")
tree<-ladderize(tree)
Q<-matrix(c(-1,1,0,0,
    1,-2,1,0,
    0,1,-2,1,
    0,0,1,-1),4,4)
x<-sim.history(tree,Q)$states
## Done simulation(s).
ordered<-matrix(c(0,1,0,0,
    1,0,1,0,
    0,1,0,1,
    0,0,1,0),4,4)
layout(matrix(c(1,2,3),3,1),heights=c(0.45,0.1,0.45))
colors<-setNames(brewer.pal(4,"Paired"),1:4)
plotTree(tree,lwd=5,type="fan",ftype="off",part=0.5,lend=0)
plot(make.simmap(tree,x,model=ordered),colors,type="fan",ftype="off",
    part=0.5,add=TRUE,lend=0)
## make.simmap is sampling character histories conditioned on the transition matrix
## 
## Q =
##           1         2         3         4
## 1 -1.301317  1.301317  0.000000  0.000000
## 2  1.301317 -2.602634  1.301317  0.000000
## 3  0.000000  1.301317 -2.602634  1.301317
## 4  0.000000  0.000000  1.301317 -1.301317
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
##    1    2    3    4 
## 0.25 0.25 0.25 0.25
## Done.
plot.new()
par(mar=rep(0,4))
plot.window(xlim=c(-1,1),ylim=c(-1,1))
h<-strheight("W")
w<-strwidth("W")
text(x=0.15*w,y=2.15*h,cex=2.8,
    expression(paste("Using ",italic("geiger"),", ",italic("phytools"),
    ", and other computational tools",sep="",collapse="")),col="white")
text(x=0.15*w,y=-1.85*h,cex=2.8,
    "to study macroevolution on phylogenies",col="white")
text(x=0,y=2*h,cex=2.8,
    expression(paste("Using ",italic("geiger"),", ",italic("phytools"),
    ", and other computational tools",sep="",collapse="")))
text(x=0,y=-2*h,cex=2.8,
    "to study macroevolution on phylogenies")
plotTree(tree,type="fan",ftype="off",part=0.5,lwd=5,
    xlim=get("last_plot.phylo", envir = .PlotPhyloEnv)$x.lim[2:1],
    ylim=get("last_plot.phylo", envir = .PlotPhyloEnv)$y.lim[2:1],
    lend=0)
plot(make.simmap(tree,x,model=ordered),colors,type="fan",ftype="off",
    part=0.5,add=TRUE,
    xlim=get("last_plot.phylo", envir = .PlotPhyloEnv)$x.lim,
    ylim=get("last_plot.phylo", envir = .PlotPhyloEnv)$y.lim,
    lend=0)
## make.simmap is sampling character histories conditioned on the transition matrix
## 
## Q =
##           1         2         3         4
## 1 -1.301317  1.301317  0.000000  0.000000
## 2  1.301317 -2.602634  1.301317  0.000000
## 3  0.000000  1.301317 -2.602634  1.301317
## 4  0.000000  0.000000  1.301317 -1.301317
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
##    1    2    3    4 
## 0.25 0.25 0.25 0.25
## Done.

plot of chunk unnamed-chunk-1

(A few plotting tricks in there!)

Here's a high quality version:

I hope a few readers of this blog decide to enroll in the course!

Wednesday, September 27, 2017

New bug for ordered, directional character evolution model in fitMk

I just discovered what seems to be a new bug in the phytools function fitMk for fitting the Mk model of discrete trait evolution. fitMk uses code modified from ape::ace, but has been adapted to do a variety of different things - such as accept ambiguous tip states (coded as prior probabilities in a matrix), and a flexible prior density at the global root of the tree. I believe I've tracked the bug down to the internally used function matexpo for matrix exponentiation from the ape package. When I substitute phytools::expm (which actually just wraps msm::MatrixExp the function works again.

I've only gotten the bug to appear for ordered, directional models. That is, for instance, a model in which we permit A→B→C, but not the reverse.

Here's an example using a dataset of squamates & digit number from a study by Matt Brandley et al. (2008). (Actually, the data are simplified because in the original data some species are variable in digit number for a particular limb, which is a nuance I've ignored.)

It might make sense to fit a model in which digit loss occurred 5→4→3→2→1→0, without jumps between non-adjacent digits numbers & without digit readquisition.* (I'm not proposing to imagine that neither is possible, let's just suppose we want to fit this seemingly reasonable model.)

We can do that as follows:

library(phytools)
library(geiger)
sqTree
## 
## Phylogenetic tree with 258 tips and 257 internal nodes.
## 
## Tip labels:
##  Abronia_graminea, Mesaspis_moreleti, Gerrhonotus_liocephalus, Barisia_imbricata, Elgaria_coerulea, Elgaria_kingii, ...
## 
## Rooted; includes branch lengths.
head(toes,n=30)
##     Agamodon_anguliceps        Amphisbaena_alba           Bipes_biporus 
##                       0                       0                       0 
##       Bipes_caniculatus       Bipes_tridactylus         Blanus_cinereus 
##                       0                       0                       0 
##   Chirindia_swimmertoni   Diplometopon_zarudnyi       Geocalamus_acutus 
##                       0                       0                       0 
##     Monopeltis_capensis      Rhineura_floridana   Trogonophis_wiegmanni 
##                       0                       0                       0 
##        Abronia_graminea         Anguis_fragilis   Anniella_geronimensis 
##                       5                       0                       0 
##        Anniella_pulchra       Barisia_imbricata   Celestus_enneagrammus 
##                       0                       5                       5 
##  Diploglossus_bilobatus      Diploglossus_pleei        Elgaria_coerulea 
##                       5                       5                       5 
##          Elgaria_kingii   Elgaria_multicarinata     Elgaria_panamintina 
##                       5                       5                       5 
##   Elgaria_paucicarinata Gerrhonotus_liocephalus       Mesaspis_moreleti 
##                       5                       5                       5 
##       Ophiodes_striatus       Ophisaurus_apodus   Ophisaurus_attenuatus 
##                       1                       1                       0 
## Levels: 0 1 2 3 4 5

I realize that this isn't the best way to visualize our data - but nevertheless:

plotTree.barplot(sqTree,
    setNames(as.numeric(toes)-1,names(toes)),
    args.plotTree=list(fsize=0.3),args.barplot=list(space=0,
    xlab="number of digits in the hindlimb",border="grey"))

plot of chunk unnamed-chunk-2

Now we can set up & fit our model:

directional<-matrix(c(0,0,0,0,0,0,
    1,0,0,0,0,0,
    0,2,0,0,0,0,
    0,0,3,0,0,0,
    0,0,0,4,0,0,
    0,0,0,0,5,0),6,6,byrow=TRUE,
    dimnames=list(levels(toes),levels(toes)))
directional
##   0 1 2 3 4 5
## 0 0 0 0 0 0 0
## 1 1 0 0 0 0 0
## 2 0 2 0 0 0 0
## 3 0 0 3 0 0 0
## 4 0 0 0 4 0 0
## 5 0 0 0 0 5 0
fitDirectional<-fitDiscrete(sqTree,toes,model=directional)
## Warning in treedata(phy, dat): The following tips were not found in 'phy' and were dropped from 'data':
##  Gonatodes_albogularis
##  Lepidophyma_flavimaculatum
##  Trachyboa_boulengeri
## Warning in fitDiscrete(sqTree, toes, model = directional): Parameter estimates appear at bounds:
##  q12
##  q13
##  q14
##  q15
##  q16
##  q23
##  q24
##  q25
##  q26
##  q31
##  q34
##  q35
##  q36
##  q41
##  q42
##  q45
##  q46
##  q51
##  q52
##  q53
##  q56
##  q61
##  q62
##  q63
##  q64
fitDirectional
## GEIGER-fitted comparative model of discrete data
##  fitted Q matrix:
##                0           1          2          3            4
##     0 0.00000000  0.00000000  0.0000000  0.0000000  0.000000000
##     1 0.02314697 -0.02314697  0.0000000  0.0000000  0.000000000
##     2 0.00000000  0.17375162 -0.1737516  0.0000000  0.000000000
##     3 0.00000000  0.00000000  0.2123601 -0.2123601  0.000000000
##     4 0.00000000  0.00000000  0.0000000  0.1118374 -0.111837366
##     5 0.00000000  0.00000000  0.0000000  0.0000000  0.004129334
##                  5
##     0  0.000000000
##     1  0.000000000
##     2  0.000000000
##     3  0.000000000
##     4  0.000000000
##     5 -0.004129334
## 
##  model summary:
##  log-likelihood = -211.796485
##  AIC = 433.592970
##  AICc = 433.831065
##  free parameters = 5
## 
## Convergence diagnostics:
##  optimization iterations = 100
##  failed iterations = 0
##  frequency of best fit = 0.06
## 
##  object summary:
##  'lik' -- likelihood function
##  'bnd' -- bounds for likelihood search
##  'res' -- optimization iteration summary
##  'opt' -- maximum likelihood parameter estimates

We have also been able to fit this model using fitMk, but this appears to be broken:

fit.phytools<-fitMk(sqTree,toes,model=directional)
## Warning in log(comp[1:M + N]): NaNs produced

## Warning in log(comp[1:M + N]): NaNs produced

## Warning in log(comp[1:M + N]): NaNs produced

## Warning in log(comp[1:M + N]): NaNs produced

Let's load the fix and test it as follows:

source("https://raw.githubusercontent.com/liamrevell/phytools/master/R/fitMk.R")
fit.phytools<-fitMk(sqTree,toes,model=directional)
fit.phytools
## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##          0         1         2         3         4         5
## 0 0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
## 1 0.023147 -0.023147  0.000000  0.000000  0.000000  0.000000
## 2 0.000000  0.173751 -0.173751  0.000000  0.000000  0.000000
## 3 0.000000  0.000000  0.212357 -0.212357  0.000000  0.000000
## 4 0.000000  0.000000  0.000000  0.111838 -0.111838  0.000000
## 5 0.000000  0.000000  0.000000  0.000000  0.004129 -0.004129
## 
## Fitted (or set) value of pi:
##         0         1         2         3         4         5 
## 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 
## 
## Log-likelihood: -213.588245 
## 
## Optimization method used was "nlminb"

Note that though very similar, fitMk and fitDiscrete are slightly different because they use different assumptions about the prior at the root of the tree. Nonetheless, it seems that fitMk is fixed and working again.

The bug will also affect other functions that use fitMk internally, such as make.simmap, but it can be fixed by updating phytools from GitHub.