Today a colleague & phytools user, Werner de Gier, tweeted the following about using phytools to graph a phylomorphospace projection with ggplot2.
Hey! Maybe it's common knowledge already, but I found that the "path" the phylomorphospace made is stored in the function in phytools. So if obj <- phylomorphospace(), than the coordinates are saved as $xx and $yy, and the path from node to node is saved as $edge. (½)
— Werner de Gier (@Werner_deGier) December 14, 2022
My reaction was the following.
Why, though?
— Liam Revell (@phytools_liam) December 14, 2022
JK. If you want to send me your ggplot::geom_segment R code, I'd be happy to throw it up on my blog as a demo (assuming I can replicate it without Excel).
FWIW, something similar is also possible for ANY plotted tree from phytools....
I nonetheless promised to share his solution on this blog – assuming that I could get it to work without the Microsoft Excel step!
Here it is.
First, let’s load our libraries.
library(phytools)
library(ggplot2)
Now let’s pull in our data.
These come from Revell & Collar (2009).
data(sunfish.tree)
tree<-as.phylo(sunfish.tree)
plotTree(tree,ftype="i",fsize=0.8)
data(sunfish.data)
head(sunfish.data)
## feeding.mode gape.width buccal.length
## Acantharchus_pomotis pisc 0.114 -0.009
## Lepomis_gibbosus non -0.133 -0.009
## Lepomis_microlophus non -0.151 0.012
## Lepomis_punctatus non -0.103 -0.019
## Lepomis_miniatus non -0.134 0.001
## Lepomis_auritus non -0.222 -0.039
X<-sunfish.data[,2:3]
Now let’s create a phylomorphospace projection using phytools.
obj<-phylomorphospace(tree,X,ftype="off",bty="n",node.size=c(0,1.2))
grid()
As Werner pointed out, phylomorphospace
returns the phylomorphospace coordinates to the user
invisibly. Let’s use them to create a data frame to be passed to ggplot2::geom_segment
.
Data<-data.frame(
xstart=obj$xx[obj$edge[,1]],
ystart=obj$yy[obj$edge[,1]],
xstop=obj$xx[obj$edge[,2]],
ystop=obj$yy[obj$edge[,2]],
nodestart=obj$edge[,1],
nodestop=obj$edge[,2])
Here’s what that looks like:
head(Data)
## xstart ystart xstop ystop nodestart nodestop
## 1 0.03520284 -0.002225660 0.11400000 -0.009000000 29 1
## 30 0.03520284 -0.002225660 0.02429463 -0.001287860 29 30
## 31 0.02429463 -0.001287860 0.01151996 -0.003347729 30 31
## 32 0.01151996 -0.003347729 -0.04908695 -0.001822143 31 32
## 33 -0.04908695 -0.001822143 -0.10899316 -0.015630236 32 33
## 34 -0.10899316 -0.015630236 -0.11776881 -0.011263323 33 34
Finally, let’s create out plot!
ggplot()+
geom_point(data= X,aes(x=gape.width,y=buccal.length),
size=3,color="darkblue")+
geom_segment(data=Data,aes(x=xstart,y=ystart,xend=xstop,yend=ystop),
size = 1,color="darkblue")
Neat. We’re done.
Hi Liam, my student Jonathan Rask Licht and I ran into a problem, wherein the phylomorphospace code for some reason doesn't get the coordinates of the root in the plot, and consequently the current code skips it, but duplicates the last value in the obj$xx and obj$yy elements. Somehow, phylomorphospace itself has no problem with this, but because the data.frame you are generating here allows the coordinates to be pulled based on the number entry, and not based on the name of the element of the vector (i.e. it pulls the 31st entry from obj$xx, and not the entry named "31"), it was giving a nonsensical set of segments to ggplot. Jonathan overcame the problem by forcing the edges to be read as characters and not numbers. But this still did not solve the problem of a node being missing from the $xx and $yy tables, nor does it explain why the default phylomorphospace plot works despite this missing value. Thoughts?
ReplyDelete