I just added this to the (fairly maligned) function genSeq, for simulating DNA sequences by (inefficiently) wrapping the phytools function sim.history. Here's what the function looks like now:

if(is.null(Q)){

Q<-matrix(1,4,4)

rownames(Q)<-colnames(Q)<-c("a","c","g","t")

diag(Q)<-0

diag(Q)<--colSums(Q)

}

if(length(rate)!=l){

if(length(rate)==1) rate<-rep(rate,l)

else {

cat("warning: length(rate) & l should match for length(rate)>1\n")

cat(" rate will be recycled.\n")

rate<-rep(rate,ceiling(l/length(rate)))[1:l]

}

}

cat("simulating sequences....\n")

X<-sapply(rate,function(a,b,c) sim.history(b,a*c)$states,b=tree,c=Q)

if(format=="DNAbin") return(as.DNAbin(X))

else if(format=="phyDat") return(as.phyDat(X))

else if(format=="matrix") return(X)

}

One can easily see that the way rate heterogeneity is handled is by allowing the user to supply a site-specific vector of rates. Here's what this would look like if we wanted to simulate sequences under a continuous Γ rate heterogeneity model with a shape parameter, α = 0.25 and a mean rate of 1.0 substitution per site per unit of branch length:

> tree<-pbtree(n=26,tip.label=LETTERS,scale=0.1)

> gg<-rgamma(n=1000,shape=0.25,rate=0.25)

> X<-genSeq(tree,rate=gg)

simulating sequences....

> X

26 DNA sequences in binary format stored in a matrix.

All sequences of same length: 1000

Labels: A B C D E F ...

Base composition:

a c g t

0.264 0.245 0.242 0.249

We can do invariant sites the same way - although in this case it will break the function if we set any of the rates to zero, so we need to set them very slightly > 0:

> tol<-1e-12

> ii[sample(1:1000,100)]<-tol

> X<-genSeq(tree,rate=ii)

simulating sequences....

> X

26 DNA sequences in binary format stored in a matrix.

All sequences of same length: 1000

Labels: A B C D E F ...

Base composition:

a c g t

0.244 0.261 0.238 0.257

As Klaus pointed out, this method is very slow - so you probably have better options. But it's still a fun exercise to see how it works - and offers the interesting possibility that we could store and return all character histories along with the states for tips.

## No comments:

## Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.