Comments

This page is for general comments about the blog.

Please feel free to post suggestions, questions, or comments about the blog material, my R functions page, or phylogenetic comparative methods generally.

Thanks!

57 comments:

  1. So far as I can tell, the above comment is spam - but since it is the only one I've got (as of 10/3/2011) on a whole page dedicated to comments, I have chosen to leave it up.

    ReplyDelete
    Replies
    1. Hi Liam,
      I've been applying evol.rate.mcmc to some data and am trying to get my head around the analysis. From what I understand it finds a single rate shift in rates of phenotypic evolution within a tree. Is there any way to test the 'significance' of any identified shifts (i.e. especially for more subtle identified shifts)? I hope that question makes sense, thanks for any advice.
      regards

      Delete
  2. Ha! Here's a real comment for you: I am having a hard time using the phyl.pca function in phytools. No matter that I am passing the mode "cov", I get an error.
    > pca<-phyl.pca(C, Y, method="BM", mode="corr")
    "Error in colSums(invC %*% X) : error in evaluating the argument 'x' in selecting a method for function 'colSums': Error in invC %*% X : requires numeric/complex matrix/vector arguments. Any ideas?

    ReplyDelete
  3. Hi Anonymous.

    Is C the vcv matrix from the tree? phyl.pca (unlike the function phyl_pca published in the appendix to Revell 2009) takes the tree not the vcv matrix as input, so you would do:

    > pca<-phyl.pca(tree,Y,mode="corr")

    If this is not your problem, I would be happy to troubleshoot any issues. If it is, then sorry for the confusion!

    - Liam

    ReplyDelete
  4. That was it! Thanks! BTW, I like phytools! Very useful so far!

    ReplyDelete
  5. Dear Liam,
    Your blog is very interesting. As you work with Simmap, I would have a very basic question: I want to do stochastic character mapping on 10000 Bayesian trees in Simmap and display the results on the consensus tree. What output files do I have to use - is it the mutational maps, which can be saved optionally? Do I have to summarize all individual maps to display them on the consensus? I had no problem to make ASR in Simmap, but after checking the manual I still don't understand how to get the output for stochastic character mapping and how to display the results. Do you have any hint for me?
    Thanks,
    David

    ReplyDelete
  6. Hi David.

    I have nothing to do with the program SIMMAP which is maintained by Jonathan Bollback. The phytools function make.simmap can do basic stochastic mapping. Other functions (read.simmap, write.simmap) in the phytools package read & write trees in the SIMMAP format.

    In the future I hope to add some of the functionality you are talking about to the phytools package, but I am not there yet.

    Thanks for checking out phytools.

    - Liam

    ReplyDelete
  7. Hi Liam,

    I am a PhD student of Vladimir Minin's at UW, and we have noticed some potentially undesirable behavior with the optim.phylo.ls function.

    What we observed is that there appears to be some stochasticity when using the "fixed=TRUE" option; that is, it would sometimes return different answers when run repeatedly with the same input tree and sequence alignment.

    The culprit appears to be the multi2di function within optim.phylo.ls. The multi2di function purports to resolve polytomies by explicitly denoting them as having branches of length 0, as opposed to a lack of a branch whatsoever. However, it is self-admittedly a stochastic procedure, in terms of exactly where it adds branches -- it seems that it should not matter in principle since it is merely adding branches of length 0. However, there appear to be some undesired consequences when each of these resolved trees are used in optim.phylo.ls

    As a demonstration, consider the following multifurcating tree:
    http://i16.photobucket.com/albums/b37/peterchi/T1.png

    Now, suppose we want to use this as the fixed topology on which to estimate the least squares branch lengths for a given sequence alignment. Let's say that we have saved the matrix of pairwise distances (according to JC69) as dist2.stan. Then, we would use the following R code:

    T2<-optim.phylo.ls(dist2.stan,stree=T1,fixed=TRUE)

    However, if we run this repeatedly, we will observe that we obtain different answers between iterations:
    > T2<-optim.phylo.ls(dist2.stan,stree=T1,fixed=TRUE)
    best Q score of 0.0040827015 found after 0 nearest neighber interchange(s).
    > T2<-optim.phylo.ls(dist2.stan,stree=T1,fixed=TRUE)
    best Q score of 3.27601e-05 found after 0 nearest neighber interchange(s).
    > T2<-optim.phylo.ls(dist2.stan,stree=T1,fixed=TRUE)
    best Q score of 0.0040827015 found after 0 nearest neighber interchange(s).
    > T2<-optim.phylo.ls(dist2.stan,stree=T1,fixed=TRUE)
    best Q score of 0.0033840255 found after 0 nearest neighber interchange(s).
    > T2<-optim.phylo.ls(dist2.stan,stree=T1,fixed=TRUE)
    best Q score of 3.27601e-05 found after 0 nearest neighber interchange(s).

    As mentioned above, the cause of these discrepancies appears to be the multi2di function. Observe what happens when we repeatedly plot the multi2di(T1) result; that is:

    pdf("multitrees.pdf",height=4,width=6)
    par(mfrow=c(2,3))
    plot(multi2di(T1))
    plot(multi2di(T1))
    plot(multi2di(T1))
    plot(multi2di(T1))
    plot(multi2di(T1))
    plot(multi2di(T1))
    dev.off()

    We then get the following trees:
    http://i16.photobucket.com/albums/b37/peterchi/multitrees.png

    And, although they clearly are all the same unrooted tree, we can examine the structure of how each tree is stored (i.e. its edges), as follows:
    http://i16.photobucket.com/albums/b37/peterchi/edges.png

    It thus seems to be this randomness that is causing the different resulting Q-scores and optimized branch lengths from a fixed topology.

    We have been able to come up with a solution for our own purposes, but we just thought we would bring this observation to your attention.

    ReplyDelete
  8. Hi Peter.

    Thanks for bringing this to my attention. Sorry I have not responded sooner. I'll look into it as promptly as I can!

    All the best, Liam

    ReplyDelete
  9. Hi Liam,

    do you know any function that could allow finding the deepest node ancestral to a group of tips? (I mean the oldest MRCA of a group of tips)

    thanks in advance!

    ReplyDelete
  10. Yes, I can help you with this. I think that, by convention in "phylo" objects, nodes closer to the root node on a given path from the root have smaller node numbers then do nodes on the same path that are farther from the root. That is to say, node 13 is not necessarily closer to the root than node 19; but, if for a set of four taxa, the pairwise MRCAs (excluding duplicates) are: 16, 15, & 14, then we know that MRCA of all 4 is 14.

    Just in case this is not guaranteed to be the case, I have written a function that uses ape::mrca & phytools::nodeHeights to compare the pairwise MRCAs of a list of species to the heights of the MRCAs above the root, and then picks the one with the lowest height. I will post it shortly.

    ReplyDelete
  11. Hi Liam.

    a simple question, how in trees mapped with "make.simmap", you can get the heights in the tree in which there are transition points between mapped states?

    thank you very much!

    ReplyDelete
  12. Hi Liam,

    I am using your anc.Bayes function in phytools with a nexus tree and continuous data. When I run the line "anc.Bayes(tree,data,ngen=10000,control=list())" I get an error message:

    "Error in if (post.odds > runif(n = 1)) { : missing value where TRUE/FALSE needed"

    Do you know what this might be? I've had no problem with my tree or data in geiger or ape.

    Thanks.

    ReplyDelete
  13. My guess is that in your during the MCMC one of: 1) the log-value of the likelihood for the current state of the chain; 2) the log-value of the likelihood for the proposed state; or 3) either of the log-priors; are numerically indistinguishable from -Inf or Inf. One thing you could do to check this is rescale your data by multiplying by 10 or 1/10. In addition, if you send me your dataset I would happy to try and diagnose the issue - and it would also be helpful in improving phytools.

    - Liam

    ReplyDelete
  14. I have sent an email to liam.revell@umb.edu with the relevant files attached (subject: phytools query).
    Thanks.

    ReplyDelete
  15. Hi Liam,
    Suggestion for you. How about adding % variance explained for the PC axes as part of the output of phyl.pca? Not hard for someone to do separately, but it would be useful output to add. I'm digging phytools.

    D

    ReplyDelete
  16. Hi, could you help me out with this? I'll just post my entire script-run.

    > rm(list = ls(all = TRUE))
    > setwd("M:\\pc\\Desktop\\Syntese MANUS\\Nye MHC verdier")
    > library(ape)
    > tree<-read.nexus("tree.nex")
    > exon<-read.table("exon.txt")
    > attach(exon)
    > X<-as.matrix(exon)
    > class(tree)
    [1] "phylo"
    > str(tree)
    List of 4
    $ edge : int [1:14, 1:2] 9 10 10 11 11 12 12 9 13 13 ...
    $ edge.length: num [1:14] 1 1 1 1 1 1 1 1 1 1 ...
    $ Nnode : int 7
    $ tip.label : chr [1:8] "Ficedula_hypoleuca" "Hylocichla_mustelina" "Luscinia_svecica" "Oenanthe_oenanthe" ...
    - attr(*, "class")= chr "phylo"
    - attr(*, "origin")= chr "tree.nex"
    > class(exon)
    [1] "data.frame"
    > str(exon)
    'data.frame': 8 obs. of 6 variables:
    $ V1: Factor w/ 8 levels "Ficedula_hypoleuca",..: 1 2 3 4 5 6 7 8
    $ V2: int 112 175 154 149 114 160 125 162
    $ V3: int 159 259 241 228 167 243 183 257
    $ V4: num 0.163 0.208 0.18 0.175 0.154 0.205 0.176 0.154
    $ V5: num 0.134 0.227 0.179 0.146 0.136 0.164 0.18 0.13
    $ V6: num 0.202 0.257 0.219 0.22 0.184 0.194 0.209 0.188
    > class(X)
    [1] "matrix"
    > str(X)
    chr [1:8, 1:6] "Ficedula_hypoleuca" "Hylocichla_mustelina" "Luscinia_svecica" ...
    - attr(*, "dimnames")=List of 2
    ..$ : NULL
    ..$ : chr [1:6] "V1" "V2" "V3" "V4" ...
    > source("phyl.pca.R")
    > pca.results<-phyl.pca(tree,X,method="lambda",mode="corr")
    [1] "Y has no names. function will assume that the row order of Y matches tree$tip.label"
    Error in invC %*% X : requires numeric/complex matrix/vector arguments
    > pca.results
    Error: object 'pca.results' not found

    If you could help me out, I would be forever grateful.

    PS, I saw the similar issue in the other comments, but seeing as "tree" is simply that, a phylogenetic tree and not a distance matrix, I don't get why the script returns this error.

    ReplyDelete
  17. Try:

    X<-read.table(...,row.names=1)
    X<-as.matrix(X)

    Let me know if that works. Liam

    ReplyDelete
  18. Dear Liam
    I am new to this field and have a very question. I want to extract the species names (tip.label) of a tree. I downloaded this tree from one on the articles. It seems that this tree is a combine tree. If you help me how can I extract the species names.
    The str of the tree is:
    Class "multiPhylo"
    List of 3
    $ tree mammalST_MSW05_bestDates :List of 4
    ..$ edge : int [1:7522, 1:2] 5021 5022 5023 5023 5024 5024 5022 5021 5025 5026 ...
    ..$ Nnode : int 2503
    ..$ tip.label : chr [1:5020] "Tachyglossus_aculeatus" "Zaglossus_bruijni" "Zaglossus_bartoni" "Ornithorhynchus_anatinus" ...
    ..$ edge.length: num [1:7522] 102.6 49.9 13.7 0.2 13.5 ...
    ..- attr(*, "class")= chr "phylo"
    $ tree mammalST_MSW05_lowerDates:List of 4
    ..$ edge : int [1:7522, 1:2] 5021 5022 5023 5023 5024 5024 5022 5021 5025 5026 ...
    ..$ Nnode : int 2503
    ..$ tip.label : chr [1:5020] "Tachyglossus_aculeatus" "Zaglossus_bruijni" "Zaglossus_bartoni" "Ornithorhynchus_anatinus" ...
    ..$ edge.length: num [1:7522] 114.2 40.5 11.5 0.2 11.3 ...
    ..- attr(*, "class")= chr "phylo"
    $ tree mammalST_MSW05_upperDates:List of 4
    ..$ edge : int [1:7522, 1:2] 5021 5022 5023 5023 5024 5024 5022 5021 5025 5026 ...
    ..$ Nnode : int 2503
    ..$ tip.label : chr [1:5020] "Tachyglossus_aculeatus" "Zaglossus_bruijni" "Zaglossus_bartoni" "Ornithorhynchus_anatinus" ...
    ..$ edge.length: num [1:7522] 91 59.3 15.9 0.2 15.7 15.7 75.2 13.1 44.4 8.4 ...
    ..- attr(*, "class")= chr "phylo".

    ReplyDelete
    Replies
    1. The tip names are in the vector $tip.label. You can pull these out from each tree in the 'multiPhylo' object using (say) tree[[1]]$tip.label (for the 1st tree, for example).

      Delete
  19. Dear Liam,
    I am trying your phyl.cca function but I keep getting the following error:
    Error in ccs > 0 : invalid comparison with complex values
    is this a format problem with the data matrices or something else is going on?
    thanks for your reply and for developing this awesome 'phytools'
    cheers
    Arley

    ReplyDelete
  20. Hi, Liam,
    Like the above individual I'm also trying to use the phyl.cca function, but have been encountering three main issues:
    1. Although I have the latest versions of R and phytools, R cannot find the function. It's definitely spelled right...
    2. When I source the R code instead, the lambda estimate statement doesn't seem to register (the exact error: unused argument(s) (fixed = F)). I'm guessing this is because the code posted 9/7/2011 doesn't include updates to use lambda.
    3. If I take out the lambda part, I get a "subscript out of bounds error," which I know must have something to do with matrices. I've tested the format with is.matrix and the rownames are identified (furthermore, I've successfully used
    phyl.pca, which has similar format requirements).
    Any suggestions? I'd be happy to send my code and files if needed.
    Thanks for taking a look at this. I enjoyed the comparative methods symposium at Evolution in Ottawa!
    Mercedes Burns

    ReplyDelete
    Replies
    1. I have actually fixed the problem I had with phyl.cca
      I just had to re-order the rownames in the data matrices to match the order of the taxa in the tree, and it worked like a charm!
      For the record, I'm using version 0.1-8.
      arley

      Delete
    2. Hi Mercedes.

      The first thing I would do is install the latest version of phytools. You should first detach phytools or (better yet), start a new R session. You can then update the package using update.packages() or install.packages("phytools"). Try to run it again and if you are not able, please let me know.

      - Liam

      Delete
    3. @Arley -

      I'm glad you got this to run, but the report that the order matters concerns me. Did the rows have names beforehand? If they are named (in theory) it should not matter - so I should investigate this!

      - Liam

      Delete
  21. Thanks for your response, Liam! Issues 1 and 2 are resolved, but I'm still getting the subscript out of bounds error. I'll try Arley's solution of reordering the rownames, but yes, let us know if there's a fix!
    Mercedes

    ReplyDelete
    Replies
    1. Hmm, the reordering rows trick didn't work for me. I get a subscript out of bounds error when I run:
      data<-phyl.cca(tree,X,Y,fixed=F)
      X is a matrix with 5 columns, Y is a matrix of 2 columns. The rows are identical in number and name. Any thoughts? Thanks again.
      Mercedes

      Delete
    2. Hi Mercedes.
      Hard for me to diagnose, but if you send me the input files, I can try and troubleshoot. This will also help improve phytools if there is a bug.
      liam.revell@umb.edu
      - Liam

      Delete
  22. Hi Liam,

    Using brownieREML(), I am testing whether the evolutionary rate of a continuous trait (y) is contingent on a discrete multistate trait x <-c("a","b","c").

    The evolutionary history of the multistate trait was reconstructed with the function make.simmap(x, model=c("SYM")). However make.simmap assumes that transitions are symmetric (model="SYM"), such that transitions from "a" to "b" are as likely than from "b" to "a". However i don't have a biological justification to assume this a priori. So, I wonder if there is a statistical justification for this assumption or should i try other models (e.g. model=c("ARD"). Unfortunately, results change when i do the latter.

    Many thanks for you time and help, and congratulations for this helpful Blog.

    Best wishes, Gabriel

    ReplyDelete
    Replies
    1. Hi Gabriel.

      If your results seem sensitive to the mapping method, you might also consider running stochastic mapping in the stand-alone program SIMMAP. It can do things not presently implemented in phytools, including sampling across model parameter values. You can export your data & tree using the phytools function export.as.xml (see ?export.as.xml for details) which will create an input file for SIMMAP.

      Good luck.

      Liam

      Delete
    2. Thanks Liam. I modified the ace() function to incorporate a designed matrix, for which i have some biological support. This seems to work well with make.simmap(). Cheers, Gabriel

      Delete
  23. Dear Liam, I am wondering, is it possible to give different colours to the taxa after pruning by using the drop.tip. I just want to show the locations of the used species in the whole tree. All the taxa in tree retains but with different colours? (e.g. used species blue or red and pruned ones grey)

    ReplyDelete
    Replies
    1. I think this is what you're going for: http://phytools.blogspot.com/2012/09/plotting-with-leaves-and-edges-to-be.html

      Delete
    2. Dear Liam
      That is really cool... Thumbs up..

      Delete
  24. Dear Liam Your fancyTree command is really cool and just what I needed. I am very new to this field and please excuse me for my silly question.
    I want to use this command but I have no idea how I can select the species that I want to prune. I have a super tree with thousands of species and there is my data set with few hundreds species. Now, I am not able to figured it out how to tell which species to retain and which to prune.. Could you help me in this regard.

    ReplyDelete
    Replies
    1. Hi. If just want to, say, prune your tree to contain only the species represented in your dataset you can just do (assuming your data are in matrix X, and the row names contain species names):
      tips<-setdiff(tree$tip.label,rownames(X))
      ptree<-drop.tip(tree,tips)
      Or, if you want to visualize it using fancyTree:
      tips<-setdiff(tree$tip.label,rownames(X))
      ptree<-fancyTree(tree,type="droptip",tip=tips)
      - Liam

      Delete
    2. Dear Liam, thanks for the reply. I tried the code but the last line of the code, ptree<-fancyTree(tree,type="droptip",tip=tips)
      is not working. it gives the following message
      Error in plot.window(...) : need finite 'xlim' values
      I don't know why is not working.

      Delete
    3. Is there any possibility that the vector, tips, is either empty or contains all the tips in the tree (or, say, all but one). You may also want to update to the latest version of phytools to make sure that you have the newest version of the function.

      Delete
  25. Hey Liam, Do you have, or do you know of, a function that will do a phylogenetic MANOVA in which the canonical axes are evaluated while taking covariance due to phylogeny into account? I'm looking for something analogous to phyl.pca. I am aware that GEIGER has a function (phy.manova) to do a standard MANOVA and compare a test-statistic to a null distribution based on simulation under an empirically determined single rate matrix (ala Garland et al. 1993), but this approach does not provide phylogenetically informed canonical axes. I would like to know what variables are evolving differently in two groups of taxa. Also, neither group is monophyletic.

    Thanks for your help.

    ReplyDelete
    Replies
    1. Unfortunately, I don't have a good answer. nlme::gls does not allow multiple response variables. Also, so far as I know, this has not been done with phylogenetic data - which makes me suspicious that simply extending generalized regression/ANOVA to multiple continuous dependent variables might be more complicated than it sounds. You could try the SIG-phylo list....

      Delete
  26. Dear Liam,
    Is there a way to perform non-parametric regression analysis like pgls?

    ReplyDelete
  27. Dear Liam

    I would like to run a pgls analysis where I account for variation in the estimation of my species means. In other words, my response variable is a mean value calculated for each species in my phylogenetic tree. However, the number of samples used to calculate these means differs among the species in my tree. Is there any way to include a weighting factor that accounts for this variation (something like the square root of the sample size) in a pgls context? Do you have any suggestions for how to code this in R?

    Many thanks

    Nick

    ReplyDelete
    Replies
    1. Hi Nick.

      Have you tried the phytools function pgls.Ives? For this function, you need sampling variances on your estimates of the means for each species - which in this case would be the within-species variance divided by the sample size for each species.

      If you do not have within-species variances, just the mean & sample size, please let me know as I'm pretty sure we could fit the Ives et al. (2007) model, but assuming that within-species variance was fixed across species and then estimating it. Obviously, it's better if you do know within species variances.

      Note that with pgls.Ives you have two options for input: you can give the function the species means, variances of the means in x & y, and sampling covariance between x & y; or you can just feed the function with the raw data for all individuals - as long as the ordering is the same for x & y.

      Unfortunately, so far I've only implemented the bivariate phylogenetic regression in this function.

      All the best, Liam

      Delete
    2. Hi Liam

      Many thanks for your reply. I can calculate the within species variance, but have the problem that for some species I only have one observation (n per species ranges from 1 to 32, and I have 14 species in my analysis). I can try and implement the solutions that you suggest in an earlier blog (http://blog.phytools.org/2012/06/blombergs-k-phylogenetic-signal-with.html), but what do you consider to be a suitably homogeneous intraspecific variance to make this option a possibility? In my case it ranges from less than 1 to nearly 40.

      Best

      Nick

      Delete
    3. Hi Nick. The within-species variances for a trait vary 40-fold? A couple of quick questions leap to mind:

      (1) This is the variance, not the sampling variance, correct? The relationship is sampling variance = variance divided by sample size. phylosig() takes the sampling variances as input, but when we only have one (or very few) observation(s) per species, I suggest using the mean within-species variance as the sampling variance for those species (or the mean/pooled variance divided by the species-specific sample size). When we have only 1 sample, of course, the within-species variance and the sampling variance of the mean are the same.

      (2) Are the data on a linear or log scale? Usually log-transformation will homogenize the variance. If your within-species variances (not sampling variances, remember) differ 40-fold, then it is probably because the trait varies widely among species as well and there is a relationship between the variance and the mean. Log-transformation will usually get rid of this.

      I look forward to hearing more details about this and I will post the solution to the blog if it is likely to be useful for or informative to others.

      Thanks! Liam

      Delete
  28. Dear Liam,
    Just a small question. To you it might be a very simple question but I am not able to sorted it out and also I may not ask the question correctly but still asking it. I am trying to show the distribution of contineous trait data over the phylogenetics tree in colours. something like,
    'tiplabels(pch=14, col=mycol(Trait data), cex=0.8)'
    but this gives me colours only at the tips as it should but if I want to colour the entire branch, how can I do it. Something like the background of your webpage with a lot of coloured trees.
    Thans

    ReplyDelete
    Replies
    1. Check out the phytools function contMap and let me know if this is what you have in mind. If you have any difficulty figuring out how to use it, please let me know and I would be happy to help. - Liam

      Delete
    2. Dear Liam, Thanks for the reply. Yes, I am looking for this kind of function. I tried this function but it is giving me an error, "Error in while (x > trans[i]) { : missing value where TRUE/FALSE needed"
      Though "tiplabels" function is working fine. I must add , before using this function, I am also using the "drop.tip", function as well.
      Thanks

      Delete
    3. Unfortunately, I can't troubleshoot that error unless you send me your input data & tree. In that case, though, I would be happy to tree. - Liam

      Delete
    4. Dear Liam, thank you very much for this help. Ok, first I will try myself to sort it out if I am not able to plot tree then will surely send my data and tree to you.
      Thank you very much once again.
      In both cases, I will let you know.

      Delete
  29. Hey Liam,

    Have you written any functions, or are you planning to, to simulate not just trait means, but trait variances for species? E..g, add trait variation as a output value for each tip along with the mean in fastBM.

    Thanks! Scott

    ReplyDelete
  30. Hi All,

    I have a quick question: how can you assign states to nodes in a phylogeny, based on reconstructions. So lets say I have the data from my reconstruction, I want to say:

    Node 16 <- 0
    Node 17 <- 1
    Node 18 <- 1

    Such that I can list the descendants from any given node, and see if they differ from the node which I started with, hope that makes sense!

    ReplyDelete
  31. Dear Liam,
    I have used make.simmap to obtain 1000 stochastic maps for a larger clade. It contains two sister-clades. My main interest is to compare transitions for these two sister -clades. describe.simmap gives average values for the entire clade, while, countSimmap provides number of transitions for 1000 trees. Is there a way to obatin these values individually for sub-clades within phytools? or Do i have to run the analyses separately for each sister-clade to obtain these values?

    Thank you,
    VJ

    ReplyDelete
    Replies
    1. Hi Liam,
      Thought inform you. Have run the analyses on the clades separately to obtain mean of the transitions.
      VJ

      Delete