Monday, February 5, 2018

How to fit a λ tree transformation for discrete character data with uncertainty on the phylogeny

The following is a small demo of how to fit a λ tree transformation to a discrete character with uncertainty.

This is possible to do in cases without uncertainty using the argument value transform="lambda" in the geiger function fitDiscrete. In the following code I'll use fitMk from phytools, which permits uncertainty in the tip states, and then one of R's built-in optimizers to find the ML value of the tree transformation parameter, λ.

The phylogeny in the following example comes from Gamble et al. (2014), and the data from a study on the evolutionary correlates of urban tolerance by my current graduate student (but soon to be postdoc) Kristin Winchell.

First, let's get our data:

library(phytools)
tree<-read.tree("lizard-tree.tre")
tree
## 
## Phylogenetic tree with 219 tips and 218 internal nodes.
## 
## Tip labels:
##  grahami, conspersus, garmani, opalinus, valencienni, lineatopus, ...
## Node labels:
##  1, 0.999900039984006, 0.999800079968013, 1, 0.289984006397441, 0.880347860855658, ...
## 
## Rooted; includes branch lengths.
X<-read.csv("tip-states.csv",row.names=1)
## this is what our data looks like
head(X)
##         urban tolerant avoid
## acutus   0.27     0.73  0.00
## aeneus   0.52     0.48  0.00
## agueroi  0.33     0.33  0.33
## ahli     0.00     0.24  0.76
## alayoni  0.00     0.41  0.59
## alfaroi  0.33     0.33  0.33
library(geiger)
obj<-name.check(tree,X)
obj
## $tree_not_data
##  [1] "aequatorialis"     "agassizi"          "altae"            
##  [4] "anatoloros"        "annectens"         "anoriensis"       
##  [7] "aquaticus"         "auratus"           "bicaorum"         
## [10] "biporcatus"        "bitectus"          "bonairensis"      
## [13] "breslini"          "calimae"           "capito"           
## [16] "carolinensis"      "carpenteri"        "casildae"         
## [19] "chloris"           "chocorum"          "chrysolepis"      
## [22] "crassulus"         "cupreus"           "danieli"          
## [25] "euskalerriari"     "festae"            "fitchi"           
## [28] "fraseri"           "frenatus"          "fuscoauratus"     
## [31] "garridoi"          "gemmosus"          "heterodermus"     
## [34] "huilae"            "humilis"           "inderanae"        
## [37] "insignis"          "intermedius"       "isthmicus"        
## [40] "jacare"            "koopmani"          "L_carinatus"      
## [43] "laeviventris"      "lemurinus"         "limifrons"        
## [46] "lineatus"          "lionotus"          "longiceps"        
## [49] "loveridgei"        "maculigula"        "meridionalis"     
## [52] "microtus"          "neblininus"        "nebuloides"       
## [55] "nebulosus"         "nicefori"          "nitens"           
## [58] "onca"              "ortonii"           "oscelloscapularis"
## [61] "oxylophus"         "P_acutirostris_1"  "P_acutirostris_2" 
## [64] "pachypus"          "pandoensis"        "peraccae"         
## [67] "podocarpus"        "poecilopus"        "polylepis"        
## [70] "polyrhachis"       "princeps"          "punctatus"        
## [73] "purpurgularis"     "quercorum"         "rejectus"         
## [76] "sericeus"          "sminthus"          "sp_nov_1"         
## [79] "sp_nov_2"          "sp_nov_3"          "tigrinus"         
## [82] "trachyderma"       "tranquillus"       "transversalis"    
## [85] "tropidogaster"     "tropidonotus"      "uniformis"        
## [88] "utilensis"         "vanzolinii"        "ventrimaculatus"  
## [91] "woodi"             "zeus"             
## 
## $data_not_tree
## [1] "agueroi"     "fairchildi"  "litoralis"   "roosevelti"  "terraealtae"
tree<-drop.tip(tree,obj$tree_not_data)
X<-X[tree$tip.label,]
name.check(tree,X)
## [1] "OK"
X<-as.matrix(X)
plotTree(tree,type="fan",lwd=1,ftype="off")
cols<-setNames(c("grey","white","darkgreen"),colnames(X))
tiplabels(pie=X,cex=0.5,piecol=cols)
add.simmap.legend(colors=cols,prompt=FALSE,shape="circle",
    x=-81,y=81)

plot of chunk unnamed-chunk-1

Now let's fit an ordered model the regular way:

ordered<-matrix(c(0,1,0,
    2,0,1,
    0,2,0),3,3,byrow=TRUE,
    dimnames=list(colnames(X),colnames(X)))
ordered
##          urban tolerant avoid
## urban        0        1     0
## tolerant     2        0     1
## avoid        0        2     0
fit<-fitMk(tree,X,model=ordered)

Now let's fit our λ model:

## our likelihood function
lk.lambda<-function(lambda,tree,x,...) 
    -logLik(fitMk(phytools:::lambdaTree(tree,lambda),
        x,...))
opt<-optimize(lk.lambda,c(0,phytools:::maxLambda(tree)),tree=tree,
    x=X,model=ordered)
opt
## $minimum
## [1] 0.8745017
## 
## $objective
## [1] 128.1338
## attr(,"df")
## [1] 2

This tells us that the ML value of λ is:

lam<-opt$minimum
lam
## [1] 0.8745017

Let's visualize the likelihood surface just to make sure we got it right:

lambda<-seq(0,1,by=0.001)
lik<--sapply(lambda,lk.lambda,tree=tree,x=X,model=ordered)
plot(lambda,lik,type="l",xlab=expression(lambda),ylab="log(likelihood)",
    lwd=2,col="darkgrey")
abline(v=lam,lty="dashed")
text(x=lam,y=-1.005*opt$objective,expression(paste("ML(",lambda,")")),
    pos=4)

plot of chunk unnamed-chunk-6

Wow. The super curious thing about this surface is its irregular appearance which we do not generally see for λ with continuous characters. Neat.

Now that we're confident that we have the ML solution, we can even do many of the ordinary things we might do with our fitted model. For instance, let's compare it to a λ = 0 model. One interpretation of this might be as a test for significant 'phylogenetic signal':

fit.lambda<-fitMk(phytools:::lambdaTree(tree,lam),X,model=ordered)
fit.h0<-fitMk(phytools:::lambdaTree(tree,0),X,model=ordered)
LR<--2*(logLik(fit.h0)-logLik(fit.lambda))
P.chisq<-pchisq(LR,df=1,lower.tail=FALSE)
P.chisq
## [1] 4.120873e-06
## attr(,"df")
## [1] 2

This tells us that - at least by this measure - there is phylogenetic signal: our data are more probable on our structured tree than on a star tree.

That's it.

No comments:

Post a Comment