Thursday, August 12, 2021

New message in force.ultrametric: this function is not a formal method to ultrametricize your tree!

I just updated the phytools function force.ultrametric to print the following message (unless explicitly turned off by the user by setting the optional argument message=FALSE):

## ***************************************************************
## *                          Note:                              *
## *    force.ultrametric does not include a formal method to    *
## *  ultrametricize a tree & should only be used to coerce to   *
## *   a phylogeny that fails is.ultramtric due to rounding --   *
## *   not as a substitute for formal rate-smoothing methods.    *
## ***************************************************************

The reason that I decided that this warning was necessary because I've seen force.ultrametric used in some papers in lieu of a formal rate-smoothing method for generating an ultrametric tree.

This is not appropriate.

What is force.ultrametric for then? It's essentially a coercion method that forces a tree that should be ultrametric, but that fails is.ultrametric (perhaps, for example, due to numerical precision or rounding of the edge lengths when it was written to file), to be so.

Here's what I mean.

Let's load the data object anoletree that comes loaded with phytools.

library(phytools)
data(anoletree)
anoletree<-as.phylo(anoletree)
plotTree(anoletree,ftype="i",fsize=0.7,lwd=1)

plot of chunk unnamed-chunk-2

Looks pretty ultrametric, right? It should be.

Think again, though – it's not!

is.ultrametric(anoletree)
## [1] FALSE

What's wrong? We can see if we pull out the total height of all the tips of the tree as follows:

h<-setNames(sapply(1:Ntip(anoletree),
    nodeheight,tree=anoletree),
    anoletree$tip.label)
h
##            ahli         allogus     rubribarbus           imias          sagrei         bremeri quadriocellifer      ophiolepis 
##               6               6               6               6               6               6               6               6 
##         mestrei           jubar      homolechis        confusus           guafe         garmani        opalinus         grahami 
##               6               6               6               6               6               6               6               6 
##     valencienni      lineatopus       evermanni       stratulus           krugi      pulchellus       gundlachi       poncensis 
##               6               6               6               6               6               6               6               6 
##           cooki    cristatellus    brevirostris        caudalis          marron        websteri       distichus         alumina 
##               6               6               6               6               6               6               6               6 
##    semilineatus         olssoni       insolitus       whitemani       haetianus        breslini         armouri         cybotes 
##               6               6               6               6               6               6               6               6 
##         shrevei   longitibialis         strahmi        marcanoi        baleatus       barahonae        ricordii         cuvieri 
##               6               6               6               6               6               6               6               6 
##   altitudinalis        oporinus        isolepis        allisoni        porcatus        loysiana         guazuma        placidus 
##               6               6               6               6               6               6               6               6 
##        sheplani         alayoni     angusticeps        paternus       alutaceus    inexpectatus       clivicola    cupeyalensis 
##               6               6               6               6               6               6               6               6 
##    cyanopleurus         alfaroi      macilentus       vanidicus        baracoae          noblei      smallwoodi    luteogularis 
##               6               6               6               6               6               6               6               6 
##       equestris   bahorucoensis dolichocephalus      hendersoni     darlingtoni        aliniger      singularis    chlorocyanus 
##               6               6               6               6               6               6               6               6 
##     coelestinus        occultus 
##               6               6

Ok…. All the tree heights seem the same. But wait a sec….

options(digits=10)
h
##            ahli         allogus     rubribarbus           imias          sagrei         bremeri quadriocellifer      ophiolepis 
##      5.99999994      5.99999994      5.99999994      5.99999994      5.99999994      5.99999994      5.99999994      5.99999994 
##         mestrei           jubar      homolechis        confusus           guafe         garmani        opalinus         grahami 
##      5.99999994      5.99999994      5.99999994      5.99999994      5.99999994      5.99999994      5.99999994      5.99999994 
##     valencienni      lineatopus       evermanni       stratulus           krugi      pulchellus       gundlachi       poncensis 
##      5.99999994      5.99999994      5.99999994      5.99999994      6.00000000      6.00000000      5.99999994      5.99999994 
##           cooki    cristatellus    brevirostris        caudalis          marron        websteri       distichus         alumina 
##      5.99999994      5.99999994      5.99999994      5.99999988      5.99999988      5.99999994      5.99999994      5.99999994 
##    semilineatus         olssoni       insolitus       whitemani       haetianus        breslini         armouri         cybotes 
##      5.99999994      5.99999994      5.99999994      6.00000000      5.99999994      5.99999994      5.99999994      5.99999994 
##         shrevei   longitibialis         strahmi        marcanoi        baleatus       barahonae        ricordii         cuvieri 
##      5.99999994      5.99999994      5.99999994      5.99999994      5.99999994      5.99999994      5.99999994      5.99999994 
##   altitudinalis        oporinus        isolepis        allisoni        porcatus        loysiana         guazuma        placidus 
##      5.99999994      5.99999994      5.99999994      5.99999994      5.99999994      5.99999994      5.99999994      6.00000000 
##        sheplani         alayoni     angusticeps        paternus       alutaceus    inexpectatus       clivicola    cupeyalensis 
##      6.00000000      6.00000000      6.00000000      6.00000000      5.99999994      5.99999994      5.99999994      5.99999994 
##    cyanopleurus         alfaroi      macilentus       vanidicus        baracoae          noblei      smallwoodi    luteogularis 
##      5.99999994      6.00000000      6.00000000      5.99999994      6.00000000      6.00000000      6.00000000      6.00000000 
##       equestris   bahorucoensis dolichocephalus      hendersoni     darlingtoni        aliniger      singularis    chlorocyanus 
##      5.99999994      6.00000000      5.99999994      5.99999994      5.99999994      5.99999994      5.99999994      6.00000000 
##     coelestinus        occultus 
##      6.00000000      5.99999994

Now we can see that some of the tips have total height of 6.00000000 while others have a total height of 5.99999994. This is probably because one of the internal edges of the tree was rounded downward by about 0.00000006 at some point – perhaps when the phylogeny was written to or read from file.

We can fix this using force.ultrametric as follows:

new.tree<-force.ultrametric(anoletree)
## ***************************************************************
## *                          Note:                              *
## *    force.ultrametric does not include a formal method to    *
## *  ultrametricize a tree & should only be used to coerce to   *
## *   a phylogeny that fails is.ultramtric due to rounding --   *
## *   not as a substitute for formal rate-smoothing methods.    *
## ***************************************************************

Our new tree has different total height as follows:

h<-setNames(sapply(1:Ntip(new.tree),
    nodeheight,tree=new.tree),
    new.tree$tip.label)
h
##            ahli         allogus     rubribarbus           imias          sagrei         bremeri quadriocellifer      ophiolepis 
##     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959 
##         mestrei           jubar      homolechis        confusus           guafe         garmani        opalinus         grahami 
##     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959 
##     valencienni      lineatopus       evermanni       stratulus           krugi      pulchellus       gundlachi       poncensis 
##     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959 
##           cooki    cristatellus    brevirostris        caudalis          marron        websteri       distichus         alumina 
##     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959 
##    semilineatus         olssoni       insolitus       whitemani       haetianus        breslini         armouri         cybotes 
##     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959 
##         shrevei   longitibialis         strahmi        marcanoi        baleatus       barahonae        ricordii         cuvieri 
##     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959 
##   altitudinalis        oporinus        isolepis        allisoni        porcatus        loysiana         guazuma        placidus 
##     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959 
##        sheplani         alayoni     angusticeps        paternus       alutaceus    inexpectatus       clivicola    cupeyalensis 
##     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959 
##    cyanopleurus         alfaroi      macilentus       vanidicus        baracoae          noblei      smallwoodi    luteogularis 
##     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959 
##       equestris   bahorucoensis dolichocephalus      hendersoni     darlingtoni        aliniger      singularis    chlorocyanus 
##     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959     5.999999959 
##     coelestinus        occultus 
##     5.999999959     5.999999959

It also now passes is.ultrametric:

is.ultrametric(new.tree)
## [1] TRUE

In this case, we have a new total height of the tree that is the (weighted) average of our two different previous heights. This is because the default method of force.ultrametric is to use non-negative least squares to minimize the sum of squared difference between the edge lengths of our tree and the edge lengths of an ultrametric tree with the same topology.

The implied patristic distances on these two trees should also be (virtually) identical:

D1<-cophenetic(anoletree)
D2<-cophenetic(new.tree)[rownames(D1),
    colnames(D1)]
plot(D1,D2,bty="n",pch=21,cex=1.2,bg="grey",
    xlab="patristic distances on original tree",
    ylab="patristic distances on NNLS tree")
lines(c(0,12),c(0,12),lty="dotted")
legend("topleft","1:1 line",lty="dotted",bty="n")

plot of chunk unnamed-chunk-9

If our tree comes from a method (like ML estimation using RAxML, for instance) that does not result in an ultrametric tree, my recommendation would be to use a formal method such as penalized likelihood (Sanderson 2002). In R, this is implemented in the ape function chronos.

Wednesday, August 4, 2021

Constraining the root state during stochastic character mapping in phytools

A phytools user asks:

“Question – can make.simmap constrain the root state of a phylogeny to a particular state? As in, if we want to assume the ancestor of primates was a frugivore, can this be done using the current setup given both frugivorous and folivorous species at the tips?”

Answer – yes!

This is done using the argument pi which sets prior probabilities for the global root node of the tree. (Fixing values for other nodes in the tree is also possible, if a tiny bit more complicated.)

Here's an example using piscivory vs. non-piscivory in sunfishes. In this case, the phylogeny is originally from Near et al. (2005).

First, let's load phytools plus our tree & data:

library(phytools)
data(sunfish.tree)
sunfish.tree<-as.phylo(sunfish.tree)
sunfish.tree
## 
## Phylogenetic tree with 28 tips and 27 internal nodes.
## 
## Tip labels:
##   Acantharchus_pomotis, Lepomis_gibbosus, Lepomis_microlophus, Lepomis_punctatus, Lepomis_miniatus, Lepomis_auritus, ...
## 
## Rooted; includes branch lengths.
data(sunfish.data)
head(sunfish.data)
##                      feeding.mode gape.width buccal.length
## Acantharchus_pomotis         pisc      0.114        -0.009
## Lepomis_gibbosus              non     -0.133        -0.009
## Lepomis_microlophus           non     -0.151         0.012
## Lepomis_punctatus             non     -0.103        -0.019
## Lepomis_miniatus              non     -0.134         0.001
## Lepomis_auritus               non     -0.222        -0.039

Let's get our feeding mode trait out:

fmode<-setNames(sunfish.data$feeding.mode,rownames(sunfish.data))

Now let's do our stochastic mapping using the default, flat prior at the root:

flat.prior<-make.simmap(sunfish.tree,fmode,model="ARD",nsim=100)
## make.simmap is sampling character histories conditioned on
## the transition matrix
## 
## Q =
##            non      pisc
## non  -6.087789  6.087789
## pisc  3.048905 -3.048905
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
##  non pisc 
##  0.5  0.5
## Done.
flat.prior
## 100 phylogenetic trees with mapped discrete characters

We can plot the posterior probabilities at all the nodes of the tree using the summary method for the object class:

flat.summary<-summary(flat.prior)
flat.summary
## 100 trees with a mapped discrete character with states:
##  non, pisc 
## 
## trees have 7.32 changes between states on average
## 
## changes are of the following types:
##      non,pisc pisc,non
## x->y     4.45     2.87
## 
## mean total time spent in each state is:
##            non      pisc    total
## raw  0.7443479 0.9507313 1.695079
## prop 0.4391228 0.5608772 1.000000
cols<-setNames(palette()[c(4,2)],levels(fmode))
plot(flat.summary,ftype="i",fsize=0.9,colors=cols)
legend("topleft",c("non-piscivorous","piscivorous"),
    pch=21,pt.cex=2,pt.bg=cols)

plot of chunk unnamed-chunk-4

Now, let's say we want to fix the root state to non-piscivory – for instance, because there's other evidence external to that from our comparative data indicating that ancestral sunfish were non-piscivorous rather than piscivorous. We can fix that using the argument pi as follows:

constrained<-make.simmap(sunfish.tree,fmode,model="ARD",pi=c(1,0),nsim=100)
## make.simmap is sampling character histories conditioned on
## the transition matrix
## 
## Q =
##            non     pisc
## non  -6.011052 6.011052
## pisc  0.000000 0.000000
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
## [1] 1 0
## Done.
constrained
## 100 phylogenetic trees with mapped discrete characters

One thing that the reader might notice is that this new constraint changes our inferred substitution model. (In this case, it is a large change – often the effect is much more subtle.)

constrained.summary<-summary(constrained)
plot(constrained.summary,ftype="i",fsize=0.9,colors=cols)
legend("topleft",c("non-piscivorous","piscivorous"),
    pch=21,pt.cex=2,pt.bg=cols)

plot of chunk unnamed-chunk-6

Neat.

We can also visualize our result using densityMap:

densityMap(constrained,lwd=6,outline=TRUE,fsize=0.9,res=200)
## sorry - this might take a while; please be patient

plot of chunk unnamed-chunk-7

This is very different from when we left the root node unconstrained!

densityMap(flat.prior,lwd=6,outline=TRUE,fsize=0.9,res=200)
## sorry - this might take a while; please be patient

plot of chunk unnamed-chunk-8

That's it.