## Friday, June 29, 2012

### R trick 1: get the frequencies of factors in a vector

Here's a quick R hint. (I had briefly forgotten how to do this, and the solution wasn't totally obvious online.) Say I have a vector of factors in memory in R and I want to get the frequency or relative frequency of the different levels of the factor, I can do this using the base generic function summary. Just to see how this works, consider a vector containing the best-fitting quantitative trait evolution model for a set of 100 trees:

> best.fit
[1] BM     BM     OU     lambda BM     BM     OU     BM
[9] BM     BM     BM     lambda lambda BM     OU     BM
[17] BM     BM     OU     BM     BM     BM     lambda BM
[25] BM     BM     BM     OU     lambda BM     BM     BM
[33] BM     OU     BM     BM     lambda lambda lambda BM
[41] BM     BM     BM     OU     BM     BM     BM     BM
[49] BM     OU     BM     BM     BM     BM     BM     BM
[57] lambda lambda OU     OU     BM     BM     lambda BM
[65] BM     BM     BM     BM     lambda BM     BM     BM
[73] BM     OU     OU     BM     lambda BM     lambda BM
[81] BM     lambda BM     BM     BM     OU     BM     BM
[89] BM     BM     OU     OU     lambda BM     BM     BM
[97] BM     BM     OU     BM
Levels: BM lambda OU

We can count up the number or relative frequency of trees with each best fit model as follows:

> summary(best.fit)
BM lambda     OU
68     16     16
> summary(best.fit)/sum(summary(best.fit))
BM lambda     OU
0.68   0.16   0.16

That's it.

1. R trick 2 to follow.

2. Neat! I never use factors much, now I might!

3. but say i have a vector with numbers summary dosn't give me the frequencies it gives me the median, mean, min, max ect.
how do i get the Frequency?

say for simplicity i have this vector: c(1,2,1,4,1,5,6,5,6,5,7)
the frequencies are: [1] 3, [2] 1, [4] 1, [5] 3, [6] 2, [7] 1
But how do i get R to tell me this?

1. This is a hack:

x<-c(1,2,1,4,1,5,6,5,6,5,7)
summary(as.factor(sort(x)))

Alternatively:

obj<-hist(x,breaks=min(x):(max(x)+1)-0.5)
f<-setNames(obj\$counts,obj\$mids)
f

The latter option also gives you any zeroes on the range of your integers.

There may (of course) be some better way to do this that I'm not aware of.

- Liam