topic Re: How to subset / filter data based on frequency of unique observations within a variable in New SAS User

How to subset / filter data based on frequency of unique observations within a variable

bretthouston — Thu, 18 Apr 2019 00:58:59 GMT

I am hoping to subset / filter a dataset by limiting based on the frequency of unique observations within a (character) variable although I'm not sure what procedure to use.

In the following example dataset:

Obs Number Letter

1 36 aaa

2 54 aaa

3 34 bbb

4 27 aaa

5 58 ccc

6 60 aaa

7 39 ddd

8 16 bbb

9 12 bbb

10 86 eee

Is there a way to create a new dataset which includes the rows where the frequency of unique observation in 'Letter' is >/=2? (Ie - this would include rows with aaa (freq = 4) and bbb (freq = 3), which works out to be rows 1-4, 6, 8 & 9. I suppose I could use 'proc freq' to determine the frequency of each unique observation and manually 'drop' the values with <2, but in a large dataset this becomes cumbersome. Thanks in advance - I appreciate any advice 🙂

Re: How to subset / filter data based on frequency of unique observations within a variable

novinosrin — Thu, 18 Apr 2019 01:06:17 GMT

No to proc freq eh?

So what next?

SQL?

or Datastep?

Re: How to subset / filter data based on frequency of unique observations within a variable

novinosrin — Thu, 18 Apr 2019 01:12:18 GMT

SQL is very convenient in my opinion


data have;
input  Obs        Number              Letter $;
cards;
1                36                      aaa
2                54                      aaa
3                34                      bbb
4                27                      aaa
5                58                      ccc
6                60                      aaa
7                39                      ddd
8                16                      bbb
9               12                       bbb
10              86                      eee
;

proc sql;
create table want as
select *
from have
group by letter
having n(letter)>=2
order by obs;
quit;

Re: How to subset / filter data based on frequency of unique observations within a variable

bretthouston — Thu, 18 Apr 2019 01:23:27 GMT

Thank-you! I just tried it and it worked perfectly. I really appreciate the help!