Saturday, August 27, 2016

Finding the association between two trees when some taxa are missing or when each host lineage matches with a clade of parasites

I've been playing testing the optimization method for tip matching in the cophylo method for co-phylogenetic plotting. Although a formal proof is lacking (& not forthcoming!), I have shown via simulation that when trees match precisely in their topology and labels, cophylo can successfully find this matching - even for large trees that are completely scrambled a priori.

This also seems to be true when some lineages are missing from one or the other tree - for instance when cospeciation is perfect, but some taxa have been lost due to extinction or missing taxa; and when cospeciation is perfect, but host lineages might possess multiple, divergent parasite races, themselves forming a monophyletic group - such as when 'speciation' for the parasite lineages occurs more rapidly than in the hosts.

Below, I demonstrate both of these cases.

First, with some missing lineages. Let's start by simulating:

library(phytools)
## simulate matching trees
t1<-t2<-rtree(n=100)
## rotate nodes randomly in one of the trees
for(i in 1:100) t2<-rotateNodes(t2,sample(1:t2$Nnode+Ntip(t2),1))
## drop lineages from each tree at random
t1<-drop.tip(t1,sample(t1$tip.label,40))
t2<-drop.tip(t2,sample(t2$tip.label,30))
## these trees are fully scrambled!
plot(cophylo(t1,t2,rotate=FALSE),fsize=0.7)

plot of chunk unnamed-chunk-1

## match them by rotating nodes
obj<-cophylo(t1,t2)
## Rotating nodes to optimize matching...
## Done.
obj
## Object of class "cophylo" containing:
## 
## (1) 2 (possibly rotated) phylogenetic trees in an object of class "multiPhylo".
## 
## (2) A table of associations between the tips of both trees.
plot(obj,fsize=0.7)

plot of chunk unnamed-chunk-1

Next, let's imagine a case in which cospeciation is perfect, but in which each host species matches with a clade of parasites (for instance). We can simulate this by first generating host & parasite trees that match, then slicing the host tree at a time point (say, when 26 lineages are extant). Next we compute all the descendants from that point in each subtree. These are our parasite clades.

## simulate trees
host<-parasite<-pbtree(n=100,scale=10)
## slice to get all subtrees
LTT<-ltt(host,plot=FALSE)
t<-LTT$times[which(LTT$ltt==26)]
subs<-treeSlice(parasite,t,trivial=TRUE)
hosts<-sapply(subs,function(x) x$tip.label[1])
host<-drop.tip(host,setdiff(host$tip.label,hosts))
host$tip.label<-LETTERS
for(i in 1:Ntip(host)){
    if(i==1) assoc<-cbind(rep(host$tip.label[i],Ntip(subs[[i]])),
            subs[[i]]$tip.label)
    else {
        assoc<-rbind(assoc,
            cbind(rep(host$tip.label[i],Ntip(subs[[i]])),
            subs[[i]]$tip.label))
    }
}
host
## 
## Phylogenetic tree with 26 tips and 25 internal nodes.
## 
## Tip labels:
##  A, B, C, D, E, F, ...
## 
## Rooted; includes branch lengths.
parasite
## 
## Phylogenetic tree with 100 tips and 99 internal nodes.
## 
## Tip labels:
##  t1, t4, t50, t51, t23, t91, ...
## 
## Rooted; includes branch lengths.
assoc
##        [,1] [,2]  
##   [1,] "A"  "t1"  
##   [2,] "B"  "t4"  
##   [3,] "C"  "t50" 
##   [4,] "C"  "t51" 
##   [5,] "D"  "t23" 
##   [6,] "D"  "t91" 
##   [7,] "D"  "t92" 
##   [8,] "E"  "t20" 
##   [9,] "E"  "t21" 
##  [10,] "F"  "t61" 
##  [11,] "F"  "t62" 
##  [12,] "F"  "t15" 
##  [13,] "G"  "t14" 
##  [14,] "G"  "t16" 
##  [15,] "G"  "t38" 
##  [16,] "G"  "t39" 
##  [17,] "G"  "t30" 
##  [18,] "H"  "t2"  
##  [19,] "I"  "t36" 
##  [20,] "I"  "t37" 
##  [21,] "I"  "t33" 
##  [22,] "I"  "t34" 
##  [23,] "J"  "t8"  
##  [24,] "J"  "t22" 
##  [25,] "J"  "t73" 
##  [26,] "J"  "t74" 
##  [27,] "J"  "t48" 
##  [28,] "K"  "t5"  
##  [29,] "L"  "t6"  
##  [30,] "M"  "t55" 
##  [31,] "M"  "t56" 
##  [32,] "N"  "t99" 
##  [33,] "N"  "t100"
##  [34,] "N"  "t64" 
##  [35,] "N"  "t52" 
##  [36,] "N"  "t17" 
##  [37,] "N"  "t40" 
##  [38,] "N"  "t41" 
##  [39,] "O"  "t19" 
##  [40,] "O"  "t68" 
##  [41,] "O"  "t69" 
##  [42,] "O"  "t18" 
##  [43,] "P"  "t25" 
##  [44,] "P"  "t43" 
##  [45,] "P"  "t44" 
##  [46,] "P"  "t70" 
##  [47,] "P"  "t71" 
##  [48,] "P"  "t63" 
##  [49,] "P"  "t11" 
##  [50,] "P"  "t10" 
##  [51,] "P"  "t9"  
##  [52,] "Q"  "t13" 
##  [53,] "Q"  "t46" 
##  [54,] "Q"  "t47" 
##  [55,] "Q"  "t42" 
##  [56,] "R"  "t28" 
##  [57,] "R"  "t29" 
##  [58,] "S"  "t49" 
##  [59,] "S"  "t89" 
##  [60,] "S"  "t90" 
##  [61,] "T"  "t26" 
##  [62,] "T"  "t27" 
##  [63,] "U"  "t45" 
##  [64,] "U"  "t79" 
##  [65,] "U"  "t80" 
##  [66,] "U"  "t97" 
##  [67,] "U"  "t98" 
##  [68,] "U"  "t12" 
##  [69,] "V"  "t7"  
##  [70,] "W"  "t57" 
##  [71,] "W"  "t58" 
##  [72,] "W"  "t59" 
##  [73,] "W"  "t81" 
##  [74,] "W"  "t82" 
##  [75,] "W"  "t60" 
##  [76,] "W"  "t32" 
##  [77,] "X"  "t24" 
##  [78,] "X"  "t75" 
##  [79,] "X"  "t76" 
##  [80,] "X"  "t77" 
##  [81,] "X"  "t78" 
##  [82,] "X"  "t65" 
##  [83,] "X"  "t66" 
##  [84,] "X"  "t67" 
##  [85,] "X"  "t95" 
##  [86,] "X"  "t96" 
##  [87,] "Y"  "t53" 
##  [88,] "Y"  "t54" 
##  [89,] "Y"  "t35" 
##  [90,] "Y"  "t93" 
##  [91,] "Y"  "t94" 
##  [92,] "Y"  "t85" 
##  [93,] "Y"  "t86" 
##  [94,] "Y"  "t72" 
##  [95,] "Y"  "t83" 
##  [96,] "Y"  "t84" 
##  [97,] "Y"  "t87" 
##  [98,] "Y"  "t88" 
##  [99,] "Y"  "t31" 
## [100,] "Z"  "t3"
## scramble order of parasites
for(i in 1:100) parasite<-rotateNodes(parasite,sample(1:parasite$Nnode+
    Ntip(parasite),1))
## they are fully scrambled
plot(cophylo(host,parasite,assoc=assoc,rotate=FALSE),
    fsize=c(1,0.5))

plot of chunk unnamed-chunk-2

## rotate to optimize matching
obj<-cophylo(host,parasite,assoc=assoc)
## Rotating nodes to optimize matching...
## Done.
plot(obj,fsize=c(1,0.5),link.type="curved")

plot of chunk unnamed-chunk-2

Pretty neat, right?

2 comments:

  1. Very useful indeed. Is there a way to display the association lines in a different color? I tried plot(obj, link.type="curved", col="red"), but it didn't work.

    ReplyDelete
    Replies
    1. I believe the argument is link.col. Sorry it's not better documented.

      Delete