An R-sig-phylo user asks:
Is there a way to detect and list all tips of a tree with 0 length in their terminal branches (essentially duplicated sequences)? Would also be great if it's possible to output them in groups so that it's clear which tips are identical to which.
I answered the first part as follows:
I would advise setting some "tolerance" value, below which you consider
a terminal edge to be zero. Then try:
tol<-1e-8
x<-na.omit(tree$tip.label[tree$edge[tree$edge.length<= tol,2]])
attr(x,"class")<-NULL
attr(x,"na.action")<-NULL
x<-na.omit(tree$tip.label[tree$edge[tree$edge.length<= tol,2]])
attr(x,"class")<-NULL
attr(x,"na.action")<-NULL
This works great for the first part - the only problem is that it ignores the second part by not telling us what tips are separated by tol or less distance from which other tips.
Here is a solution that addresses that:
D<-cophenetic(tree)
tol<-1e-12 # say
x<-apply(D,1,function(x) names(which(x<=tol)))
x<-unique(x)
x<-x[sapply(x,length)>1]
tol<-1e-12 # say
x<-apply(D,1,function(x) names(which(x<=tol)))
x<-unique(x)
x<-x[sapply(x,length)>1]
Let's take the following tree:
and try it out:> require(ape)
Loading required package: ape
> D<-cophenetic(tree)
> tol<-1e-12 # say
> x<-apply(D,1,function(x) names(which(x<=tol)))
> x<-unique(x)
> x<-x[sapply(x,length)>1]
> x
[[1]]
[1] "t9" "t11"
[[2]]
[1] "t10" "t12" "t13"
[[3]]
[1] "t3" "t14"
[[4]]
[1] "t6" "t17"
[[5]]
[1] "t1" "t18" "t19"
Loading required package: ape
> D<-cophenetic(tree)
> tol<-1e-12 # say
> x<-apply(D,1,function(x) names(which(x<=tol)))
> x<-unique(x)
> x<-x[sapply(x,length)>1]
> x
[[1]]
[1] "t9" "t11"
[[2]]
[1] "t10" "t12" "t13"
[[3]]
[1] "t3" "t14"
[[4]]
[1] "t6" "t17"
[[5]]
[1] "t1" "t18" "t19"
Cool - it works.
No comments:
Post a Comment
Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.