BookmarkSubscribeRSS Feed
OwnerOfTwoCats
Calcite | Level 5

Hello all, I am trying to write a code where I take my data, put it into a table, and then just make a table for the top 5 and bottom 5 values by frequency.

 

My code is:

Proc Freq data=db1;
	table statefips / nocum;
run;

Proc Sort data=db1;
	by statefips;
run;

 

My Table

 StateCode Frequency 

263571.47
43260.24
351391.97
158700.89
20535711.49
312401.75
220061.23
51150.29
38710.22
1067305.97
547263.06
81870.46
90650.51
751414.20
386832.16
190391.06
171540.96
244581.37
248951.39
79440.44
357112.00
409912.29
564793.16
336791.88
154390.86
357932.00
60460.34
113150.63
159390.89
85660.48
518962.90
106740.60
1101026.16
555703.11
43960.25
689053.85
212461.19
228791.28
745774.17
63830.36
269271.51
51000.29
365522.04
1391587.78
158250.89
42370.24
483242.70
403372.26
96580.54
364362.04
35790.20

 

I really never learned how to pull specific values out of a table to keep, so I'm a little stumped. If anyone can help me with this, I would greatly appreciate it.

2 REPLIES 2
Patrick
Opal | Level 21

If your table is sorted by FREQ then just picking the top 3 and bottom 3 rows is not that hard. But.... how do you intend to deal with ties?

 

In below data would you want to pick the row where category="D"?

data have;
  input category $ freq;
  datalines;
A 10
B 9
C 9
D 8
E 7
F 6
G 5
H 4
I 3
;

 

 

Using above data the code to just pick the first and last 3 rows could look like below.

proc sort data=have;
  by descending freq;
run;

data want;
  set have nobs=nrows;
  by descending freq;
  if _n_ <=3 or _n_>nrows-3;
run;

proc print data=want;
run;

 As you can see the row with the 3rd highest value hasn't been selected because it was row 4. ...and if you would have 3 rows with a 2nd highest value then even one of these wouldn't get selected.

Patrick_0-1708739959804.png

Proc Rank can help you dealing with ties. In below code code the rows with the three highest and three lowest values for Freq will get selected. 

proc rank data=have out=inter ties=dense descending;
  var freq;
  ranks ranking;
run;
proc sql;
/*  create table want as*/
  select *
  from inter
  having ranking<=3 or ranking>=max(ranking)-2
  ;
quit;

 

 

mkeintz
PROC Star

You might have ties, so this code keeps tied values at each extreme, as long as at least the 3 lowest and 3 highest observations are output:

 


proc sort data=have;
  by descending freq;
run;

%let min_needed=3;  /* At least 3 highest and 3 lowest, including ties */

data want;
  set have nobs=nrows;
  by descending freq;
  output;

  /* If first 3 have been output then stop when ties are exhausted,
     then do the same from the bottom up */
  if _n_>=&min_needed and last.freq=1 then do p=nrows by -1;
    set have point=p;
    if p<nrows-&min_needed and freq^=lag(freq) then stop;
    output;
  end;
run;

proc print data=want;
run;

In the case of a very large dataset, this can be much faster because it doesn't pass through the middle observations.  

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-white.png

🚨 Early Bird Rate Extended!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.

 

Lock in the best rate now before the price increases on April 1.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 986 views
  • 0 likes
  • 3 in conversation