Friday, December 23, 2022

Fix to bug in phytools force.ultrametric function to coerce a tree to be ultrametric

R phylogenetics power-user & comparative methods star Jacob Berv contacted me this afternoon with a bug in the phytools method force.ultrametric in which the function appeared to be creating a basal polytomy where none had existed in the input tree.

For context, phytools::force.ultrametric is a not a formal method for ultrametricizing a phylogenetic tree (like non-parametric rate smoothing, for example). Instead, it is designed to implement a couple of different simple procedures to coerce a tree to pass ape::is.ultrametric even if it should fail it due to numerical precision (e.g., rounding) issues with the edge lengths.

This is done using two main algorithms. The default (denominated method="nnls") uses non-negative least squares to find the ultrametric set of edge lengths that results in the set of patristic distances with the minimum sum of squared difference to the patristic distances implied by the original tree. (The other just extends all the tips – but this method was unaffected by the bug discovered by Jacob.)

Here is a demonstration of the bug, using the example tree supplied by Jacob.

library(phytools)
library(phangorn)
tree<-read.tree(file="test.tre")
par(mfrow=c(1,2))
plotTree(tree,ftype="off")
plotTree(phytools::force.ultrametric(tree),ftype="off")
## ***************************************************************
## *                          Note:                              *
## *    force.ultrametric does not include a formal method to    *
## *    ultrametricize a tree & should only be used to coerce    *
## *   a phylogeny that fails is.ultramtric due to rounding --   *
## *    not as a substitute for formal rate-smoothing methods.   *
## ***************************************************************

plot of chunk unnamed-chunk-1

OK. Something is obviously wrong here.

Since method="nnls" actually uses phangorn::nnls.tree internally, I was able to quickly trace the issue to this phangorn update by Klaus Schliep.

Here is the new, fixed version of force.ultrametric which I have already pushed to GitHub.

force.ultrametric<-function(tree,method=c("nnls","extend"),...){
	if(hasArg(message)) message<-list(...)$message
	else message<-TRUE
	if(message){
		cat("***************************************************************\n")
		cat("*                          Note:                              *\n")
		cat("*    force.ultrametric does not include a formal method to    *\n")
		cat("*    ultrametricize a tree & should only be used to coerce    *\n")
		cat("*   a phylogeny that fails is.ultramtric due to rounding --   *\n")
		cat("*    not as a substitute for formal rate-smoothing methods.   *\n")
		cat("***************************************************************\n")
	}
	method<-method[1]
	if(method=="nnls") tree<-nnls.tree(cophenetic(tree),tree,
		method="ultrametric",rooted=is.rooted(tree),trace=0)
	else if(method=="extend"){
		h<-diag(vcv(tree))
		d<-max(h)-h
		ii<-sapply(1:Ntip(tree),function(x,y) which(y==x),
			y=tree$edge[,2])
		tree$edge.length[ii]<-tree$edge.length[ii]+d
	} else 
		cat("method not recognized: returning input tree\n\n")
	tree
}

Let’s make sure it works by repeating our plot from above.

par(mfrow=c(1,3))
plotTree(tree,ftype="off",mar=c(0.1,1.1,2.1,1.1))
mtext("a) input tree",line=0,adj=0.1)
plotTree(phytools::force.ultrametric(tree),ftype="off",
	mar=c(0.1,1.1,2.1,1.1))
## ***************************************************************
## *                          Note:                              *
## *    force.ultrametric does not include a formal method to    *
## *    ultrametricize a tree & should only be used to coerce    *
## *   a phylogeny that fails is.ultramtric due to rounding --   *
## *    not as a substitute for formal rate-smoothing methods.   *
## ***************************************************************
mtext("b) CRAN force.ultrametric",line=0,adj=0.1)
plotTree(force.ultrametric(tree),ftype="off",
	mar=c(0.1,1.1,2.1,1.1))
## ***************************************************************
## *                          Note:                              *
## *    force.ultrametric does not include a formal method to    *
## *    ultrametricize a tree & should only be used to coerce    *
## *   a phylogeny that fails is.ultramtric due to rounding --   *
## *    not as a substitute for formal rate-smoothing methods.   *
## ***************************************************************
mtext("c) GitHub force.ultrametric",line=0,adj=0.1)

plot of chunk unnamed-chunk-3

Obviously, this bug will be fixed in the next CRAN release of phytools but can already be obtained by updating phytools from GitHub.

Happy holidays.

Wednesday, December 14, 2022

Generating a ggplot phylomorphospace plot with phytools

Today a colleague & phytools user, Werner de Gier, tweeted the following about using phytools to graph a phylomorphospace projection with ggplot2.

My reaction was the following.

I nonetheless promised to share his solution on this blog – assuming that I could get it to work without the Microsoft Excel step!

Here it is.

First, let’s load our libraries.

library(phytools)
library(ggplot2)

Now let’s pull in our data.

These come from Revell & Collar (2009).

data(sunfish.tree)
tree<-as.phylo(sunfish.tree)
plotTree(tree,ftype="i",fsize=0.8)

plot of chunk unnamed-chunk-2

data(sunfish.data)
head(sunfish.data)
##                      feeding.mode gape.width buccal.length
## Acantharchus_pomotis         pisc      0.114        -0.009
## Lepomis_gibbosus              non     -0.133        -0.009
## Lepomis_microlophus           non     -0.151         0.012
## Lepomis_punctatus             non     -0.103        -0.019
## Lepomis_miniatus              non     -0.134         0.001
## Lepomis_auritus               non     -0.222        -0.039
X<-sunfish.data[,2:3]

Now let’s create a phylomorphospace projection using phytools.

obj<-phylomorphospace(tree,X,ftype="off",bty="n",node.size=c(0,1.2))
grid()

plot of chunk unnamed-chunk-3

As Werner pointed out, phylomorphospace returns the phylomorphospace coordinates to the user invisibly. Let’s use them to create a data frame to be passed to ggplot2::geom_segment.

Data<-data.frame(
	xstart=obj$xx[obj$edge[,1]],
	ystart=obj$yy[obj$edge[,1]],
	xstop=obj$xx[obj$edge[,2]],
	ystop=obj$yy[obj$edge[,2]],
	nodestart=obj$edge[,1],
	nodestop=obj$edge[,2])

Here’s what that looks like:

head(Data)
##         xstart       ystart       xstop        ystop nodestart nodestop
## 1   0.03520284 -0.002225660  0.11400000 -0.009000000        29        1
## 30  0.03520284 -0.002225660  0.02429463 -0.001287860        29       30
## 31  0.02429463 -0.001287860  0.01151996 -0.003347729        30       31
## 32  0.01151996 -0.003347729 -0.04908695 -0.001822143        31       32
## 33 -0.04908695 -0.001822143 -0.10899316 -0.015630236        32       33
## 34 -0.10899316 -0.015630236 -0.11776881 -0.011263323        33       34

Finally, let’s create out plot!

ggplot()+
	geom_point(data= X,aes(x=gape.width,y=buccal.length),
	size=3,color="darkblue")+
	geom_segment(data=Data,aes(x=xstart,y=ystart,xend=xstop,yend=ystop), 
	size = 1,color="darkblue")

plot of chunk unnamed-chunk-6

Neat. We’re done.