BookmarkSubscribeRSS Feed
Malathi13
Obsidian | Level 7

Hi,

I have a dataset with county and complaint variables. I want to get the top 5 complaints by county.

 

E.g.   County       Complaint                         Count

          ABC          sick                                     100

          DEF          Sick                                   200

          ABC          Chest Pain                          200

          DEF          Chest Pain                          300

          ABC         Heart problem                    500

 

I want to get 10 5 complaints for each county  E.g for County ABC SICK, CHESTPAIN, HEARTPROBLEM, BREATHING PROBLEM and EYE PROBLEM are top 5 then I want the list of the county name with top 5 complaints and the count of it.

 

Can I get any help. I did proc freq and got the list of all complaint by County but not sure how to get the top 5 only for each county.

 

Thank you

M

11 REPLIES 11
ballardw
Super User

Here is one way using a SAS supplied dataset so you should be able to test the code.

proc sort data=sashelp.cars out=carsort;
  by make descending msrp;
run;

data want;
  set carsort;
  retain makecount;
  by make;
  if first.make then makecount=1;
  else makecount+1;
  if makecount le 5;
run;

You need not create a new set with your sort, I just don't want to mess with the supplied data set.

 

The sort in your case would be COUNTY instead of Make, and what every your count variable is in the second place. This groups the data with the largest count first for each county.

Then the data step using by group processing with First. to set a counter and select the records with desired order of appearance.

Malathi13
Obsidian | Level 7

Hi ballardw,

Thank you for the quick response, I tried your code, it is giving me the 5 records for each county but they are not the top five. For e.g. for ABC county I have breathing (391), Unknown problem (342),sick person (317),  Falls (220), chest pain (213) as top five but the output is giving me Unknown Problem (342), Unconscious (199), Trauma (15), Traffic (142) and Stroke (81). I want the top highest complaints and not just the top 5 in order.

 

 

Thank you

M

 

 

 

PaigeMiller
Diamond | Level 26

Don't write your own data step code to do this.

 

Use built-in code, that SAS has already created (for your benefit) and tested, and handles ties if they exist. For example

 

proc sort data=have;
by county;
run;
proc rank data=have out=want descending; by county; var count;
ranks count_rank;
run;

In the output data set, you want the observations with count_rank values 1 to 5.

--
Paige Miller
ballardw
Super User

@Malathi13 wrote:

Hi ballardw,

Thank you for the quick response, I tried your code, it is giving me the 5 records for each county but they are not the top five. For e.g. for ABC county I have breathing (391), Unknown problem (342),sick person (317),  Falls (220), chest pain (213) as top five but the output is giving me Unknown Problem (342), Unconscious (199), Trauma (15), Traffic (142) and Stroke (81). I want the top highest complaints and not just the top 5 in order.

 

 

Thank you

M

 

 


Your original post said "I want to get 10 5 complaints for each county". That is what the code does if "top" means the largest count.

If you have a different requirement than be much more explicit, provide better examples of starting data and what the final result is supposed to look like.

Your list in this post is pretty much not helpful as we do not have any county identified, or what any of the counts you show.

Also, show the code you used. It looks almost like you sorted on "COMPLAINT" not the COUNT indicated you should.

Reeza
Super User

Please ensure your sample data matches what you're looking for, in this case you didn't provide enough sample data. 

 

I believe your problem is relatively straightforward though, see the code below.

 

Screen Shot 2017-12-22 at 12.52.14 PM.png

PaigeMiller
Diamond | Level 26

Why, @Reeza, in a post earlier this week, you recommend PROC RANK and then in this post you write your own data step code to do the same thing?

 

Why, when you point out earlier this week that PROC RANK handles ties, do you write data step code that does not handle ties?

--
Paige Miller
Reeza
Super User

@PaigeMiller wrote:

Why, @Reeza, in a post earlier this week, you recommend PROC RANK and then in this post you write your own data step code to do the same thing?

 

Why, when you point out earlier this week that PROC RANK handles ties, do you write data step code that does not handle ties?


I recognized them as slightly different problems. At the end of the day though, the OP has multiple options here and needs to decide what's best. Because you don't know how many are in each groups, this is slightly harder - but not impossible to use PROC RANK. 

 

When looking for Top N -> That's not a percentile measure, whereas PROC RANK is. 

Top 5 to me, implies only 5 results whereas RANK would give you possibly more. You are correct in that it doesn't handle ties, which should be dealt with differently.

 

 

 

 

PGStats
Opal | Level 21

Ties will generate fractional ranks that can mess up your logic. For instance:

 

data test;
input x;
datalines;
1
2
2
2
2
2
2
2
2
2
2
3
4
5
;

proc rank data=test out=ranks;
var x; ranks rank;
run;


proc print data=ranks noobs; run;
                                    x    rank

                                     1     1.0
                                     2     6.5
                                     2     6.5
                                     2     6.5
                                     2     6.5
                                     2     6.5
                                     2     6.5
                                     2     6.5
                                     2     6.5
                                     2     6.5
                                     2     6.5
                                     3    12.0
                                     4    13.0
                                     5    14.0
PG
PaigeMiller
Diamond | Level 26

@PGStats wrote:

Ties will generate fractional ranks that can mess up your logic. For instance:


But handling ties is a good thing when trying to figure out the top 5!!

 

And contrary to the example by @PGStats, there is an option in PROC RANK to say TIES=LOW, and then the logic doesn't get messed up, because in the example provided by PGStats, all the observations that have a value of 2 get a rank of 5.

--
Paige Miller
PGStats
Opal | Level 21

When the requirement is to get 5 low (or top) values in the presence of ties, the tie handling options offered by proc rank (MEAN, LOW, HIGH) are not really useful. You can't say where rank <= 5 and get exactly 5 low (or top) values in every case.

 

                                      mean     low    high
                          Obs    x    Rank    Rank    Rank

                            1    1     1.0      1       1
                            2    2     6.5      2      11
                            3    2     6.5      2      11
                            4    2     6.5      2      11
                            5    2     6.5      2      11
                            6    2     6.5      2      11
                            7    2     6.5      2      11
                            8    2     6.5      2      11
                            9    2     6.5      2      11
                           10    2     6.5      2      11
                           11    2     6.5      2      11
                           12    3    12.0     12      12
                           13    4    13.0     13      13
                           14    5    14.0     14      14

We would need options TIES=RANDOM or TIES=DATA to get the expected sample size.

 

PG
PaigeMiller
Diamond | Level 26

If there are ties when you find the fifth highest, you don't want EXACTLY 5 results, you want more than 5 results.

--
Paige Miller

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 2121 views
  • 2 likes
  • 5 in conversation