Phylogenetic Tools for Comparative Biology: September 2025

Thursday, September 25, 2025

New version of phytools on CRAN

This past weekend I was able to get a new version of phytools on CRAN (phytools 2.5-2).

This new phytools version includes various new models & updates previously documented on this blog. Among these (in mostly chronological order, not in order of importance):

Minor improvements in the model objects and optimization diagnostics returned to the user by fitPagel (here). fitPagel in phytools can be used to fit a binary trait correlated evolution model of Pagel (1994).
A new general cladebox function for drawing boxes around a clade on a phylogeny plotted in rectangular, fan, or arc styles (1, 2).
A totally new ‘semi-threshold’ trait evolution model where evolution of the liability behaves as a quantitative trait on some interval, but as a discrete threshold character outside of it (1, 2), fitsemiThresh
A new bounded quantitative trait evolution model with absorbing bounds (1, 2) in bounded_bm.
A brand new multi-σ², multi-μ, discrete-state-dependent trended Brownian model (1, 2, 3), fitmultiTrend.
A new Pagel ’94 type correlation model with one binary trait and one continuous character, in which the state of the continuous trait is allowed to affect the transition rate of the discrete character (1, 2), fitcontMk.
New simulator functions for absorbing bounded Brownian motion (sim.absorbing: 1, 2), Brownian motion with reflective bounds (sim.reflective), and multi-trend trended Brownian evolution (sim.multiTrend: here).

Getting the new version of phytools is easy, just update from CRAN! To get the most recent development phytools version, I recommend first installing the remotes CRAN package and then running:

remotes::install_github("liamrevell/phytools")

Thanks for reading!

Monday, September 8, 2025

Ancestral states for a discrete trait when some nodes are known

Following the Evolution meeting this summer, a grad student reached out to me with a question. Unfortunately, it’s taken me longer than I would’ve hoped to respond to him – but he followed up recently, so I thought I’d give it a quick stab.

The question was as follows:

We met at the Evolution conference last week, because I mentioned having an issue with ancestral reconstruction in my talk. You suggested that I email you after the conference. In short, I want to perform a categorical reconstruction using multiple internal nodes having fixed values.

This is straightforward enough. Unfortunately, there is no “black-box” way to do this – we’re gonna need to code it ourselves.

The trick is that to fix one or any number of internal nodes all we need to do is attach a zero length tip to that node and then assign the known state.

To see how to go about doing that, I’m going to need some data first!

## load phytools
library(phytools)
## simulate a small tree
tree<-pbtree(n=26,tip.label=LETTERS,scale=1)
## set a value of Q
Q<-matrix(c(-1,1,0,1,-2,1,0,1,-1),3,3,
  dimnames=list(letters[1:3],letters[1:3]))
## simulate data
xx<-sim.Mk(tree,Q,internal=TRUE)
xx

##  A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z 27 28 29 30 31 32 33 34 35 36 37 
##  a  b  b  c  c  c  b  c  c  c  c  c  c  c  b  b  b  b  c  c  c  c  c  c  c  b  b  b  b  b  b  c  c  c  c  c  c 
## 38 39 40 41 42 43 44 45 46 47 48 49 50 51 
##  c  c  c  c  c  c  b  b  b  c  c  c  c  c 
## Levels: a b c

Here’s our observed data and all of the known states at internal states graphed onto the tree.

plotTree(tree,offset=1,direction="upwards",
  lwd=1)
pp<-get("last_plot.phylo",envir=.PlotPhyloEnv)
cols<-setNames(palette.colors(3,"Polychrome 36"),
  letters[1:3])
points(pp$xx,pp$yy,pch=16,col=cols[xx],cex=1.5)

plot of chunk unnamed-chunk-97

Our data include the states at all nodes. This is an unrealistic approximation of our scenario in which it’s more likely that we have a sample of nodes. Since our tree contains 25 internal nodes, let’s imagine that we have values for just 5 of these, as follows.

nn<-xx[sort(sample(1:tree$Nnode+Ntip(tree),10))]
nn

## 29 32 34 35 37 38 42 44 48 49 
##  b  c  c  c  c  c  c  b  c  c 
## Levels: a b c

These data are labeled according to the node indices of our "phylo" object, as follows.

plotTree(tree,offset=1,direction="upwards",
  lwd=1)
nodelabels(cex=0.6,bg="white")

plot of chunk unnamed-chunk-99

Our tip data, on the other hand, can be subsampled to only the species in our tree.

x<-xx[tree$tip.label]
x

## A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 
## a b b c c c b c c c c c c c b b b b c c c c c c c b 
## Levels: a b c

For our next step, we’re going to go through and add zero length tips to each of the internal nodes from nn. The only problem is that our node indices change with every new tip that we add! The trick here is to use phytools::matchNodes to update the node we’re adding the tip to after every new taxon is added.

nntree<-tree
for(i in 1:length(nn)){
  M<-matchNodes(tree,nntree,method="distances")
  nntree<-bind.tip(nntree,names(nn)[i],edge.length=0,
    where=M[which(M[,1]==as.numeric(names(nn)[i])),2])
}

Let’s check our result.

par(mfrow=c(1,2))
plotTree(tree,fsize=0.8,lwd=1)
nodelabels(cex=0.6)
plotTree(nntree,fsize=0.8,lwd=1)

plot of chunk unnamed-chunk-102

Really, that looks exactly right.

Next, we can fit our desired Mk model, and we should do so to the full tree & internal states – since, genuinely, known internal states provide us with information about the evolutionary process of our trait. I’m going to fit a custom model that matches our known generating process for the trait.

MODEL<-matrix(c(0,1,0,1,0,1,0,1,0),3,3)
mk_fit<-fitMk(nntree,c(x,nn),model=MODEL)
mk_fit

## Object of class "fitMk".
## 
## Fitted (or set) value of Q:
##           a         b         c
## a -0.513978  0.513978  0.000000
## b  0.513978 -1.027956  0.513978
## c  0.000000  0.513978 -0.513978
## 
## Fitted (or set) value of pi:
##        a        b        c 
## 0.333333 0.333333 0.333333 
## due to treating the root prior as (a) flat.
## 
## Log-likelihood: -15.088024 
## 
## Optimization method used was "nlminb"
## 
## R thinks it has found the ML solution.

Now to get our ancestral states, we just need to pass this fitted model object to phytools::ancr.

anc_mk<-ancr(mk_fit)
anc_mk

## Marginal ancestral state estimates:
##           a        b        c
## 37 0.150557 0.629894 0.219549
## 38 0.019778 0.705542 0.274680
## 39 0.000000 1.000000 0.000000
## 40 0.000199 0.953008 0.046794
## 41 0.002164 0.981934 0.015902
## 42 0.000000 0.000000 1.000000
## ...
## 
## Log-likelihood = -15.088024

We can plot our results.

plot(anc_mk,
  args.plotTree=list(direction="upwards"),
  args.tiplabels=list(cex=0.4))

plot of chunk unnamed-chunk-105

For fun, let’s compare this result to the true values and to those that we would’ve obtained without the known internal states.

par(mfrow=c(1,3))
plotTree(tree,offset=0.5,lwd=1,ftype="i",
  mar=c(0.1,1.1,2.1,0.1))
pp<-get("last_plot.phylo",envir=.PlotPhyloEnv)
cols<-setNames(palette.colors(3,"Polychrome 36"),
  letters[1:3])
points(pp$xx,pp$yy,pch=16,col=cols[xx],cex=1.5)
mtext("a) true states",line=0,adj=0,cex=0.7)
plot(anc_mk,
  args.plotTree=list(mar=c(0.1,1.1,2.1,0.1),fsize=0.8),
  args.tiplabels=list(cex=0.8),
  args.nodelabels=list(cex=0.8),
  legend=FALSE)
mtext("b) estimates with some known nodes",line=0,adj=0,
  cex=0.7)
plot(ancr(fitMk(tree,x,model=MODEL)),
  args.plotTree=list(mar=c(0.1,1.1,2.1,0.1),fsize=0.8),
  args.tiplabels=list(cex=0.8),
  args.nodelabels=list(cex=0.8),
  legend=FALSE)
mtext("c) estimates based only on tip data",line=0,adj=0,
  cex=0.7)

plot of chunk unnamed-chunk-106

In general, we should find that our internal nodes are more accurate, and more confident – particularly when close to known nodes of the tree – than when only tip data is used.

That’s pretty much it!

Pages

Thursday, September 25, 2025

New version of phytools on CRAN

Monday, September 8, 2025

Ancestral states for a discrete trait when some nodes are known