Wednesday, June 17, 2015

Calculating the times spent with each of number of lineages on the tree

Yesterday a phytools user emailed me the following inquiry:

“Is there an easy way in phytools to infer the times while there are exactly k ancestral lineages in the tree? I would need to calculate these based on a newick tree input. However, I havent found a function in phytools for it. Somthing like k=2 from x to y k=3 from z to v”

This can be done pretty easily using the phytools function ltt. Of course, one might allow for the possilibity that - if lineages both increase & decrease through time, there could be more than one period of time during which there is 2 lineages, 3 lineages etc.

The following gives a quick demo - first using a tree in which the number of lineages both increases & descreases in time since the root:

library(phytools)
tree<-rtree(26,tip.label=LETTERS)
plotTree(tree)

plot of chunk unnamed-chunk-1

## create an object with the number of lineages through time:
obj<-ltt(tree,log=FALSE)

plot of chunk unnamed-chunk-2

Now let's go through all the number of possible lineages and tabulate all the time intervals in which the tree has each number of lineages:

foo<-function(n,obj){
    ii<-which(obj$ltt==n)
    ii<-ii[ii!=length(obj$ltt)]
    sapply(ii,function(i,x) c(x[i],x[i+1]),x=obj$times)
}
Nlineages<-sapply(1:max(obj$ltt),foo,obj=obj)
if(is.matrix(Nlineages)){
    colnames(Nlineages)<-1:max(obj$ltt)
    rownames(Nlineages)<-c("start","end")
} else if(is.list(Nlineages)){
    names(Nlineages)<-1:max(obj$ltt)
    for(i in 1:length(Nlineages)) 
        rownames(Nlineages[[i]])<-c("start","end")
}

Our object is a list, with a matrix for each number of lineages showing the time intervals spent with that number of lineages.

Nlineages
## $`1`
##               16
## start 0 4.128545
## end   0 4.198241
## 
## $`2`
##               27       15
## start 0.00000000 4.049935
## end   0.07445996 4.128545
## 
## $`3`
##               28       13       11
## start 0.07445996 1.501722 3.962453
## end   0.15032225 1.711749 4.049935
## 
## $`4`
##              30         1         4        2       41       22
## start 0.1503222 0.3991084 0.5425447 1.499231 1.711749 3.655242
## end   0.2915629 0.4658409 0.8433361 1.501722 2.151097 3.962453
## 
## $`5`
##              31        50        29       24       35       21
## start 0.2915629 0.4658409 0.8433361 1.464400 2.151097 3.651133
## end   0.3991084 0.5425447 0.8714423 1.499231 2.194213 3.655242
## 
## $`6`
##              51        3       37        5        9       18
## start 0.8714423 1.458014 2.194213 2.331990 3.612215 3.642776
## end   0.9738692 1.464400 2.274517 2.353893 3.618789 3.651133
## 
## $`7`
##              32       26       25       45       42       14       12
## start 0.9738692 1.096267 1.325357 2.274517 2.353893 2.607869 3.509315
## end   1.0088520 1.263415 1.458014 2.331990 2.573532 2.785814 3.612215
##             44
## start 3.618789
## end   3.642776
## 
## $`8`
##             33       34       46       38        7
## start 1.008852 1.263415 2.573532 2.785814 3.484318
## end   1.096267 1.325357 2.607869 2.810022 3.509315
## 
## $`9`
##             47       20        6
## start 2.810022 3.318845 3.382049
## end   2.953015 3.380557 3.484318
## 
## $`10`
##             39       23       19        8       40
## start 2.953015 3.103372 3.199463 3.277445 3.380557
## end   2.988247 3.114297 3.252606 3.318845 3.382049
## 
## $`11`
##             43       36       17       49
## start 2.988247 3.114297 3.179681 3.252606
## end   3.103372 3.149976 3.199463 3.277445
## 
## $`12`
##             48
## start 3.149976
## end   3.179681

Next, we can try and ultrametric trees with all lineages terminating in the present:

tree<-pbtree(n=26,tip.label=LETTERS)
plotTree(tree)

plot of chunk unnamed-chunk-5

obj<-ltt(tree,log=FALSE)

plot of chunk unnamed-chunk-6

foo<-function(n,obj){
    ii<-which(obj$ltt==n)
    ii<-ii[ii!=length(obj$ltt)]
    sapply(ii,function(i,x) c(x[i],x[i+1]),x=obj$times)
}
Nlineages<-sapply(1:max(obj$ltt),foo,obj=obj)
if(is.matrix(Nlineages)){
    colnames(Nlineages)<-1:max(obj$ltt)
    rownames(Nlineages)<-c("start","end")
} else if(is.list(Nlineages)){
    names(Nlineages)<-1:max(obj$ltt)
    for(i in 1:length(Nlineages)) 
        rownames(Nlineages[[i]])<-c("start","end")
}

This time we have a matrix in which each column is a number of lineages and the rows show the start and end times. When no lineages are lost through time, there will be only one time interval for each number of lineages:

Nlineages
##           1            2         3         4        5        6        7
## start 0e+00 2.000000e-11 0.4418836 0.5968292 1.356948 1.752875 1.756543
## end   2e-11 4.418836e-01 0.5968292 1.3569481 1.752875 1.756543 1.876857
##              8        9       10       11       12       13       14
## start 1.876857 1.931771 2.000830 2.022528 2.046486 2.218467 2.283606
## end   1.931771 2.000830 2.022528 2.046486 2.218467 2.283606 2.445454
##             15       16      17       18       19       20       21
## start 2.445454 2.506024 2.57934 2.586070 2.596797 2.668859 2.671522
## end   2.506024 2.579340 2.58607 2.596797 2.668859 2.671522 2.735799
##             22       23       24       25       26
## start 2.735799 2.773744 2.779681 2.823757 2.865986
## end   2.773744 2.779681 2.823757 2.865986 2.866205

Note that the tiny zeroeth timeslice with one lineage at the start of each tree is an idiosyncratic feature of phytools::ltt and could easily be ignored.

That's it.

No comments:

Post a Comment