Monday, July 14, 2014

New user controls in phylo.to.map

I just posted a version of the phytools function phylo.to.map, which can be used to project a tree onto a geographic map. Based on some user feedback, including at the Latin American Macroevolutionary Workshop we just held at the Universidad de los Andes in Bogotá, Colombia, I have added a couple of new features & attributes to the function.

First, the function now automatically removes the underscore character ("_") from within species names. Note that this is only for type="phylogram", not type="direct"; although in the latter case it can easily be done manually.

Second, again for type="phylogram", the lines connecting the tree to locations on a geographic map now (by default) arise from the end of the tip label - rather than from the tip itself.

Finally, third, the function now allows user control of the color of both the dashed lines & points connecting species labels to their geographic locations (for type="phylogram"); and of the both the lines and tip dot colors (for type="direct".

These updates are in the latest non-CRAN phytools build. Here's a quick demo:

> packageVersion("phytools") [1] ‘0.4.24’
> ## simulate some realistic looking data
> foo<-function() paste(c(sample(LETTERS,1),
sample(letters,round(runif(n=1,min=3,max=5))),"_",
sample(letters,round(runif(n=1,min=4,max=6)))),
collapse="")
> tree<-pbtree(n=40,tip.label=replicate(40,foo()))
> tree$edge.length<-tree$edge.length*
100/max(nodeHeights(tree))
> lat<-fastBM(tree,sig2=10,bounds=c(-90,90))
> long<-fastBM(tree,sig2=20,bounds=c(-180,180))
> X<-cbind(lat,long)
> ## ok, first the (new) standard plot:
> obj<-phylo.to.map(tree,X,ftype="i",fsize=0.7,
asp=1.2,split=c(0.5,0.5))
objective: 302
...
objective: 220
objective: 218
objective: 218
> ## now let's change the colors:
> plot(obj,colors=c("blue","red"),ftype="i",fsize=0.7,
asp=1.2,split=c(0.5,0.5))
> ## direct projection with different colors
> plot(obj,type="direct",colors=c("red","black"),
fsize=0.7)
That's it.

Wednesday, July 2, 2014

Principal components rotation of ancestral states; or ancestral state reconstruction of PC scores?

I recently fielded the following query:

From what I understand, the phylomorphospace function allows the mapping of the phylogeny on a morphospace defined by 2 characters/traits. However, I'm using multivariate data from a morphometric analysis, and the shape variation is explained by a lot of axes. The first 2 PCs can approximate the shape variation, but my question is: how do you calculate ancestral nodes using all the PC scores, or all of your fourier coefficients if using this method? Is it possible? Do you calculate the nodes on the PCs you are interested in? Or do you calculate the nodes on all your variables and then project them in the desired shape space?

So, in essence, should we perform ancestral state reconstruction on our original characters and then rotate them using the PC loadings; or should we first run phylogenetic PCA and then perform ancestral state reconstruction on the PC scores for species?

Well, the answer is - it doesn't matter. We will obtain the same ancestral state reconstructions for our PCs regardless of our order of operations. This could no doubt be proved analytically, but it is also straightforward to demonstrate via simulation as follows.

First, simulate tree & data:

> ## stochastic pure-birth tree
> tree<-pbtree(n=26,tip.label=LETTERS)
> ## random covariance matrix
> L<-matrix(rnorm(16),4)
> L[upper.tri(L)]<-0
> V<-L%*%t(L)
> ## simulated correlated character evolution on the tree
> X<-sim.corrs(tree,vcv=V)

Now, we can do our two analyses - first compute ancestral states on the original data & rotate using the phylogenetic PCA loadings; and then rotate our tip data, and reconstruct ancestral states for the PC scores:

> ## perform phylogenetic PCA
> pca<-phyl.pca(tree,X)
> ## reconstruct ancestral states for original data
> A<-apply(X,2,fastAnc,tree=tree)
> ## rotate in PC space
> ## compute phylogenetic means for calculation
> M<-matrix(rep(A[1,],nrow(A)),nrow(A),ncol(A),byrow=TRUE)
> rotatedA<-(A-M)%*%pca$Evec
> ## now compute ancestral states for PCA scores
> S<-apply(pca$S,2,fastAnc,tree=tree)

Now we can compare them:

> plot(rotatedA,S)

That's it.

Tuesday, July 1, 2014

New version of read.newick that can read root node labels

The phytools function read.newick is a slower, but slightly more robust Newick tree reader than read.tree in the ape package - specifically in that it permits certain types of 'badly conformed' Newick strings, such as those containing singleton nodes.

It's Achilles heel, however, has been that it does not permit a node label for the root node of the tree. The fix for this was simple and was supplied to me by Joseph Brown. The updated function code is here & also in the latest non-CRAN phytools version (phytools >= 0.4-23) here.

Bug fix in make.simmap for multi-state data and non-reversible model of evolution

In response to an R-sig-phylo query about stochastic mapping in phytools I wrote:

I can answer the three questions that pertain to phytools:

(1) The function make.simmap has a bug for non-reversible models when the input character has more than 2 states. This has to do with the algorithm for simulating changes along edges where the correct waiting time is simulated, but then the states at the end of the waiting time are chosen with the incorrect probabilities. This will sometimes cause make.simmap to hang for a long time. This does not affect the states simulated at nodes (which occurs first) and should not affect stochastic character mapping at all for binary characters or any reversible model of character evolution. I have posted a fixed version here - but you can also install a version of phytools containing this update here. Use phytools >= 0.4-22.

(2) You should look into the function describe.simmap to summarize the time spent in each state, etc., from stochastic character mapping using make.simmap..

(3) To simulate stochastic character histories you can use the phytools function sim.history. Note, though, that the matrix Q is the transpose of the fitted value of Q from make.simmap. Sorry about this. That means to simulate stochastic character histories on your tree you can do:

## stochastic mapping:
mtrees<-make.simmap(tree,x,model="ARD",nsim=100)
## summarize results
obj<-describe.simmap(mtrees)
obj
plot(obj) ## PP at nodes from stochastic mapping
fitted.Q<-mtrees[[1]]$Q
## simulate under fitted model
simhistories<-sim.history(tree,t(fitted.Q),nsim=100,anc=rstate(obj$ace[1,]))

The last part (anc=rstate(obj$ace[1,])) is necessary if you have non-reversibility, because this way you are picking from the posterior distribution at the root from your real data.

In addition, sim.history does not permit any columns (rows in the matrix from make.simmap) to be equal to 0.0 (which we will have if we have a truly non-reversible character). To resolve this, you can do something like:

ii<-which(diag(fitted.Q)==0)
fitted.Q[ii,-ii]<-max(nodeHeights(tree))*1e-12
fitted.Q[ii,ii]<--sum(fitted.Q[ii,])
simhistories<-sim.history(tree,t(fitted.Q),nsim=100,anc=rstate(obj$ace[1,]))

Friday, June 27, 2014

Updates to phylomorphospace3d & locate.yeti; new version of phytools

I just posted a new version of phytools. This can be downloaded from the phytools page & installed from source; however I have also submitted this version (phytools 0.4-20) to CRAN, so hopefully it will be accepted by the CRAN gatekeepers and spread out onto all the CRAN mirrors within a few days.

Some important updates in this version relative to the latest minor version include:

1. An update to locate.yeti to permit the use of a set of edge constraints to search for the ML position of the leaf to be added to the tree.

2. An update to phylomorphospace3d so that it now checks internally to see if the package rgl has been installed, and then loads it if available. This permits me to change phytools relationship with rgl from Depends to Suggests, which is designed to allow users to install & load phytools on systems where rgl cannot be installed.

This version of phytools also has a wide range of other new functions & updates relative to the last CRAN version, for instance:

1. A new version of contMap that permits some trait values for tip taxa to be missing from the dataset.

2. A new function, plotTree.wBars, that permits users to plot square or circular phylograms with bars instead of tip labels (e.g., 1, 2, 3).

3. A new function, setMap, to set the color-ramp for an object of class "contMap" or "densityMap"..

4. Some other useful updates to contMap and densityMap.

5. A new version of fancyTree that fixes a small bug for type = "scattergram".

6. The aforementioned new function (locate.yeti) for locating the position of a cryptic or recently extinct taxon on an ultrametric phylogeny using continuous character data.

7. An S3 generic method rep for objects of class "phylo" and "multiPhylo" (here).

8. A new option in phylomorphospace3d to color internal edges.

Finally, 9. New options for node placement in square phylograms plotted with plotSimmap and plotTree.

That's it!

Wednesday, June 18, 2014

Alternative node placements in plotted trees using plotTree & plotSimmap

Felsenstein (2004; pp. 574-576) gives four different node placements for square phylograms. The most commonly used node placement what Felsenstein calls intermediate, i.e., the node is placed at the midpoint of the uppermost & lowermost edges immediately descended from that node. This is what that looks like for a stochastic 13 taxon tree:

> packageVersion("phytools")
[1] ‘0.4.16’
> plotTree(tree)

Centered node placement puts the node - instead of at the vertical point intermediate between the upper & lowermost immediate daughters - at the midpoint of all tips descended from a node. This is just the average of the upper and lowermost of this set. Here's what that looks like for the same tree:

> plotTree(tree,nodes="centered")

Like intermediate node placement, weighted node placement uses only the immediate descendants - but it weights their vertical position (inversely) by the edge length leading to each daughter node. E.g.:

> plotTree(tree,nodes="weighted")

(Note that Figure 34.1c in Felsenstein appears to use a slightly different algorithm than that described in the text - which I have followed here - in which internal and tip edges are treated differently.)

Finally, inner node placement, puts each node in the 'innermost' position of all its descendants. E.g.:

> plotTree(tree,nodes="inner")

(In this case the algorithm described by Felsenstein (2004; p. 575) seems to be incorrect unless I've misread. I achieved the desired effect not by using the of the y values of the descendant edges; but the vertical position closest to the median vertical position of all the tips.)

All of these methods can also be used with stochastic mapped trees (since plotSimmap is the engine that runs plotTree). For instance:

> data(anoletree)
> plotSimmap(anoletree,nodes="inner",fsize=0.7,ftype="i")
no colors provided. using the following legend:
     CG    GB       TC     TG     Tr        Tw
"black" "red" "green3" "blue" "cyan" "magenta"
> add.simmap.legend(colors=setNames(palette()[1:6],
  sort(unique(getStates(anoletree,"tips")))))
Click where you want to draw the legend
\

That's it. See some of you at 'Evolution' in a few days.

Monday, June 16, 2014

Computing the average trait value for the set of taxa descended from each node

A recent comment asked if there was an easy way to substitute the raw average value of the tips descended from a node for ancestral states obtained via ancestral state reconstruction when using the phytools function plotBranchbyTrait. This is pretty easy. Here's a demo (although it could be done a dozen different ways, including using a simple for loop).

(This assumes only that our tree is an object of class "phylo" called tree; and, importantly, that our trait data is a named vector x, in which the names correspond to the tip labels of tree.)

getDescendants<-phytools:::getDescendants
nn<-1:tree$Nnode+Ntip(tree)
a<-setNames(sapply(nn,function(n,x,t){
  d<-getDescendants(t,n)
  mean(x[t$tip.label[d[d<=Ntip(t)]]])
  },x=x,t=tree),nn)

Now we can use it with plotBranchbyTrait as follows. First, if we want each edge to be colored as the average of the parent & daughter nodes (or tip):

plotBranchbyTrait(tree,c(x,a),mode="nodes")

Alternatively, we can have the branch color determined only from the daughter nodes or tips (in which case we throw out the root value):

y<-c(setNames(x[tree$tip.label],1:Ntip(tree)),a)
plotBranchbyTrait(tree,y[tree$edge[,2]],mode="edges")

That's pretty much it.