Wednesday, November 4, 2015

New function to simulate species trees from genus-level tree

I just added a function to phytools to automate the task of attaching species-level subtrees simulated under a Yule process to a backbone genus-level tree. I described this here last week.

Here is how it works:

## first update phytools
library(devtools)
install_github("liamrevell/phytools")
## load
library(phytools)
## here's our hypothetical genus tree:
genus.tree
## 
## Phylogenetic tree with 16 tips and 15 internal nodes.
## 
## Tip labels:
##  Vkfns, Ikfxnyht, Qkyca, Bnlpd, Wkpws, Bkhq, ...
## 
## Rooted; includes branch lengths.
plotTree(genus.tree,ftype="i")

plot of chunk unnamed-chunk-2

## here are our tips to add
tips
##  [1] "Ikfxnyht_opqcfh" "Ikfxnyht_nvaocw" "Ikfxnyht_knzcyf"
##  [4] "Qkyca_pzdukv"    "Qkyca_jukdfi"    "Qkyca_ritdjl"   
##  [7] "Qkyca_agrjou"    "Qkyca_qoyafl"    "Qkyca_qrfsbu"   
## [10] "Bnlpd_hidqnw"    "Bnlpd_wlthqi"    "Bnlpd_gayokd"   
## [13] "Bnlpd_lqjspu"    "Wkpws_xbahqo"    "Wkpws_pwdnuv"   
## [16] "Bkhq_hqdkbx"     "Bkhq_ojrbaw"     "Bkhq_pzwyfq"    
## [19] "Bkhq_dejpxs"     "Bkhq_efnyzp"     "Bkhq_kymfvj"    
## [22] "Bgln_tfuizj"     "Bgln_tpdsuf"     "Bgln_ycaqrs"    
## [25] "Bgln_havrzp"     "Bgln_bonyms"     "Bgln_nhquvm"    
## [28] "Nrztfp_blunwa"   "Iakywem_jqhxdr"  "Lhkrtyl_usqjhl" 
## [31] "Lhkrtyl_nbjhyc"  "Lhkrtyl_wmhkcd"  "Lhkrtyl_wmdkqp" 
## [34] "Lhkrtyl_dfhlkj"  "Lhkrtyl_kpadxb"  "Javcgbph_qvgmrh"
## [37] "Javcgbph_vzqfir" "Ctfioy_yjuxhp"   "Ctfioy_bcgonv"  
## [40] "Yrvp_prchgs"     "Yrvp_zcjuqx"     "Yrvp_usjbyc"    
## [43] "Hmvlk_nymqow"    "Hmvlk_ktjxnq"    "Hmvlk_crtdau"   
## [46] "Hmvlk_hemxwi"    "Hmvlk_xygpmk"    "Hmvlk_hxofgw"   
## [49] "Fioxrcyl_pecahk" "Fioxrcyl_uifsdv" "Okmfyg_ahxljq"
## add them:
species.tree<-genus.to.species.tree(genus.tree,tips)
plotTree(species.tree,ftype="i",fsize=0.7)

plot of chunk unnamed-chunk-2

That's it.

Obviously, the data & tree above were simulated. Here is code used in simulation

## first let's simulate our genus tree:
foo<-function() paste(sample(LETTERS,1),paste(sample(letters,
    round(runif(1,min=3,max=7))),collapse=""),sep="")
genera<-replicate(16,foo())
genus.tree<-pbtree(n=length(genera),tip.label=genera,scale=80)
genus.tree$edge.length[which(genus.tree$edge[,2]<=Ntip(genus.tree))]<-
    genus.tree$edge.length[which(genus.tree$edge[,2]<=Ntip(genus.tree))]+20
tips<-c()
for(i in 1:Ntip(genus.tree)){
    n.genus<-sample(0:6,1)
    if(n.genus>0) for(j in 1:n.genus) 
        tips<-c(tips,paste(genus.tree$tip.label[i],
        paste(sample(letters,6),collapse=""),sep="_"))
}

2 comments:

  1. It may be worth pointing out to readers Abouheif 1998 "Random trees and the comparative method: A cautionary tale" Evolution 52(4): 1197-1204.

    "One of the toughest problems facing comparative biology is the paucity of robust phylogenetic hypotheses for many taxonomic groups. Martins (1996) proposed a method to analyze comparative data in the absence of a known phylogeny using randomly generated trees. Before applying this method, however, researchers should be aware that (1) parameter estimates derived from this method essentially assume a star phylogeny, and thus, estimate the same evolutionary regression or correlation coefficient as traditional cross-species analyses; and (2) statistical conclusions derived from this method may be so conservative as to mask evolutionary patterns, such as Rensch's rule, and should be interpreted with caution."

    The PDF is at http://biology.mcgill.ca/faculty/abouheif/articles/Abouheif%201998.pdf .

    This paper is about making up entire trees, not just most of the tree, as with genus.to.species.tree(). However, to me, it suggests we should be cautious: having a mostly or entirely made up tree is not the same as having a tree based on data and may lead to problems in many analyses. Nothing wrong with the genus.to.species.tree() function, mind you, but researchers need to consider whether it is safe or not to use a tree based on this or similar approaches for their particular question. Sometimes (many times, I'd wager) it's far better to wait for data or work on the available subset of data.

    ReplyDelete
    Replies
    1. Yes, I would also not recommend using this function in the way you describe. - Liam

      Delete

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.