Tuesday, May 13, 2014

contMap with missing data

Today a phytools user emailed with the following question:

"I am still having difficulty running an analysis on the evolution of a continuous character with a partial data set (I have character states for about 50% of the taxa). I would like to display the results using contMap, but since fastAnc is built into the code, it won't map the character because the # of taxa in the data set doesn't match the # of taxa in the tree. I attempted to replace fastAnc with the modified version of anc.ML from one of your blog posts (link), but I couldn't get that to work either. I received the same error message as with fastAnc:
Error in ace(x, a, method = "pic") : length of phenotypic and of phylogenetic data do not match
Is there any way to use
contMap with missing data? Any help would be greatly appreciated. Thank you for your time."

They are totally on the right track, but things get a little trickier than they might have anticipated because whereas fastAnc (when used in it's default mode) just returns a vector of estimated character values at nodes, anc.ML (when some taxa are present in the tree but missing from the input data) returns a list with the vector missing.x containing the MLE phenotypic trait values of taxa with missing data. It also happens to be the case that I never got around to adding this updated version of anc.ML to the phytools package.

Well, I have now made both updates so it is possible to (1) specify an inference method (method="fastAnc" or method="anc.ML"); and (2) phytools function anc.ML now permits missing data, in which case the states for the tip taxa in the tree missing from the data vector will be estimated using ML. The code for these updates are here: anc.ML, contMap, but since both use internal functions not in the phytools namespace it is probably wise to download the latest phytools build and install from source.

Here's a quick demo:

> packageVersion("phytools")
[1] ‘0.4.7’
> ## simulate tree & data
> tree<-pbtree(n=26,tip.label=LETTERS)
> x<-fastBM(tree)
> ## compute limits on x to use in both plots
> lims<-c(floor(min(x)*10)/10,ceiling(max(x)*10)/10)
> ## simulate missing tip values
> y<-sample(x,20)
> ## which taxa are missing
> setdiff(tree$tip.label,names(y))
[1] "H" "I" "O" "T" "W" "Z"
> ## contMap from full dataset
> contMap(tree,x,lims=lims,legend=1.5)
> ## contMap from reduced dataset
> contMap(tree,y,lims=lims,method="anc.ML",legend=1.5)

OK - that's it.

7 comments:

  1. Do you think it might be worthwhile to annotate the tips with missing data so which are missing is immediately apparent? Maybe give them a white bubble or something at the very tip?

    ReplyDelete
    Replies
    1. This can be done easily already by the user because contMap works with ape tiplabels and nodelabels. Good suggestion.

      Delete
  2. Amazing work! Beautiful.

    A general question:
    Would it be possible in contMap to specify our own colorramp instead of stuck with the default red-to-blue?

    ReplyDelete
    Replies
    1. This is not too hard. I just blogged about it here. Let me know if you have any difficulties. - Liam

      Delete
  3. Hello Dr. Revell,

    I am very new to R and have been using it for a few weeks. I have a dataset of continuous characters which has missing data. When I used the example above, I could not load the data file(which in .csv format) onto the tree. and also when i ran the code, I got an error which said Error in ace(x, a, method = "pic") :
    length of phenotypic and of phylogenetic data do not match.
    In addition: Warning message:
    In mean.default(x) : argument is not numeric or logical: returning NA.

    Also when I use the setdiff function, the missing value taxa are not shown correctly, i.e the ones which have missing values arent shown, instead its random all the time.
    Here is the code I ran:
    library(ape)
    library(phytools)
    tree<- read.nexus("data1.nexus")
    data<- read.csv("lhpfam.csv")
    a<-extract.clade(tree, node=91)
    plot(a)
    x <- fastBM(a)
    col2=2
    y<- data
    contMap(a,y,method="anc.ML",legend=0.5)

    Here is the dataset : https://onedrive.live.com/redir?resid=6d3c647fc5b8119c%21138

    ReplyDelete
  4. (Sorry if this is a duplicate; my other comment didn't go through)

    I have just started using contMap too! I just got it to work for missing data and here were the pitfalls that I fell into...

    (1) Continuous character data should be formatted as row #1 as species names, and row #2 as continuous character data. I think you might need to transpose your data matrix.

    (2) If you read in your data file as read.csv, it's read in as a data frame and that throws an error "Error in A[i, 1]/YY[j] : non-numeric argument to binary operator" So I used the unlist() function to convert it into a vector... which seemed to make the function happy!

    Overall, here is my code:

    tree <- read.tree("Bayes_rooted.tre")
    protodata <- read.csv("protodata.csv", header=TRUE)
    protovector <- unlist(protodata)
    lims<-c(floor(min(protovector)*10)/10,ceiling(max(protovector)*10)/10)
    contMap(tree, protovector, lims=lims, legend=1.5, method="anc.ML")

    Hopefully that helps and isn't secretly misleading! (Please correct me if I'm wrong!) Thanks Dr. Revell for writing such an awesome function!

    ReplyDelete
    Replies
    1. Hello Stephanie,

      I did try the code you've posted after transposing my data, but I am still getting the same error .

      Delete