Friday, June 29, 2012

R trick 1: get the frequencies of factors in a vector

Here's a quick R hint. (I had briefly forgotten how to do this, and the solution wasn't totally obvious online.) Say I have a vector of factors in memory in R and I want to get the frequency or relative frequency of the different levels of the factor, I can do this using the base generic function summary. Just to see how this works, consider a vector containing the best-fitting quantitative trait evolution model for a set of 100 trees:

> best.fit
 [1] BM     BM     OU     lambda BM     BM     OU     BM
 [9] BM     BM     BM     lambda lambda BM     OU     BM
[17] BM     BM     OU     BM     BM     BM     lambda BM
[25] BM     BM     BM     OU     lambda BM     BM     BM
[33] BM     OU     BM     BM     lambda lambda lambda BM
[41] BM     BM     BM     OU     BM     BM     BM     BM
[49] BM     OU     BM     BM     BM     BM     BM     BM
[57] lambda lambda OU     OU     BM     BM     lambda BM
[65] BM     BM     BM     BM     lambda BM     BM     BM
[73] BM     OU     OU     BM     lambda BM     lambda BM
[81] BM     lambda BM     BM     BM     OU     BM     BM
[89] BM     BM     OU     OU     lambda BM     BM     BM
[97] BM     BM     OU     BM
Levels: BM lambda OU


We can count up the number or relative frequency of trees with each best fit model as follows:

> summary(best.fit)
   BM lambda     OU
   68     16     16
> summary(best.fit)/sum(summary(best.fit))
   BM lambda     OU
 0.68   0.16   0.16


That's it.

4 comments:

  1. Neat! I never use factors much, now I might!

    ReplyDelete
  2. but say i have a vector with numbers summary dosn't give me the frequencies it gives me the median, mean, min, max ect.
    how do i get the Frequency?

    say for simplicity i have this vector: c(1,2,1,4,1,5,6,5,6,5,7)
    the frequencies are: [1] 3, [2] 1, [4] 1, [5] 3, [6] 2, [7] 1
    But how do i get R to tell me this?

    ReplyDelete
    Replies
    1. This is a hack:

      x<-c(1,2,1,4,1,5,6,5,6,5,7)
      summary(as.factor(sort(x)))

      Alternatively:

      obj<-hist(x,breaks=min(x):(max(x)+1)-0.5)
      f<-setNames(obj$counts,obj$mids)
      f

      The latter option also gives you any zeroes on the range of your integers.

      There may (of course) be some better way to do this that I'm not aware of.

      - Liam

      Delete

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.