I just added this to the (fairly maligned) function genSeq, for simulating DNA sequences by (inefficiently) wrapping the phytools function sim.history. Here's what the function looks like now:
if(is.null(Q)){
Q<-matrix(1,4,4)
rownames(Q)<-colnames(Q)<-c("a","c","g","t")
diag(Q)<-0
diag(Q)<--colSums(Q)
}
if(length(rate)!=l){
if(length(rate)==1) rate<-rep(rate,l)
else {
cat("warning: length(rate) & l should match for length(rate)>1\n")
cat(" rate will be recycled.\n")
rate<-rep(rate,ceiling(l/length(rate)))[1:l]
}
}
cat("simulating sequences....\n")
X<-sapply(rate,function(a,b,c) sim.history(b,a*c)$states,b=tree,c=Q)
if(format=="DNAbin") return(as.DNAbin(X))
else if(format=="phyDat") return(as.phyDat(X))
else if(format=="matrix") return(X)
}
One can easily see that the way rate heterogeneity is handled is by allowing the user to supply a site-specific vector of rates. Here's what this would look like if we wanted to simulate sequences under a continuous Γ rate heterogeneity model with a shape parameter, α = 0.25 and a mean rate of 1.0 substitution per site per unit of branch length:
> tree<-pbtree(n=26,tip.label=LETTERS,scale=0.1)
> gg<-rgamma(n=1000,shape=0.25,rate=0.25)
> X<-genSeq(tree,rate=gg)
simulating sequences....
> X
26 DNA sequences in binary format stored in a matrix.
All sequences of same length: 1000
Labels: A B C D E F ...
Base composition:
a c g t
0.264 0.245 0.242 0.249
We can do invariant sites the same way - although in this case it will break the function if we set any of the rates to zero, so we need to set them very slightly > 0:
> tol<-1e-12
> ii[sample(1:1000,100)]<-tol
> X<-genSeq(tree,rate=ii)
simulating sequences....
> X
26 DNA sequences in binary format stored in a matrix.
All sequences of same length: 1000
Labels: A B C D E F ...
Base composition:
a c g t
0.244 0.261 0.238 0.257
As Klaus pointed out, this method is very slow - so you probably have better options. But it's still a fun exercise to see how it works - and offers the interesting possibility that we could store and return all character histories along with the states for tips.
No comments:
Post a Comment
Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.