I am hoping to subset / filter a dataset by limiting based on the frequency of unique observations within a (character) variable although I'm not sure what procedure to use.
In the following example dataset:
Obs Number Letter
1 36 aaa
2 54 aaa
3 34 bbb
4 27 aaa
5 58 ccc
6 60 aaa
7 39 ddd
8 16 bbb
9 12 bbb
10 86 eee
Is there a way to create a new dataset which includes the rows where the frequency of unique observation in 'Letter' is >/=2? (Ie - this would include rows with aaa (freq = 4) and bbb (freq = 3), which works out to be rows 1-4, 6, 8 & 9. I suppose I could use 'proc freq' to determine the frequency of each unique observation and manually 'drop' the values with <2, but in a large dataset this becomes cumbersome. Thanks in advance - I appreciate any advice 🙂
Thank-you! I just tried it and it worked perfectly. I really appreciate the help!
No to proc freq eh?
So what next?
SQL?
or Datastep?
SQL is very convenient in my opinion
data have;
input Obs Number Letter $;
cards;
1 36 aaa
2 54 aaa
3 34 bbb
4 27 aaa
5 58 ccc
6 60 aaa
7 39 ddd
8 16 bbb
9 12 bbb
10 86 eee
;
proc sql;
create table want as
select *
from have
group by letter
having n(letter)>=2
order by obs;
quit;
Thank-you! I just tried it and it worked perfectly. I really appreciate the help!
Nearly 200 sessions are now available on demand with the SAS Innovate Digital Pass.
Explore Now →SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.