Selecting 5 max observations per category

deleted_user · Posted 05-15-2009 11:34 AM

Hi,

I'd like to write some code that selects the 5 biggest firms in a given category. In my mind, it should go something like:

proc sort data=source;
by category size;
run;

proc sql;
create table newtable as select
avg(var1) as var1, avg(var2) as var2, sum(var3) as var3
from source
where (size is in the top 5)
group by category;
quit;

However, I'm not sure what the code to determine if it's one of the biggest 5 observations is. Any ideas?

Thanks!

statsplank · Posted 05-15-2009 03:29 PM

Hi tbalsky,

Please look at the following example. I explained steps in the comments.

/* original data set */
data firm; input category size @@;
datalines;
1 93458934 1 8467489 1 8679 2 56757 3 57850
1 456794 2 697058 2 59876 3 687957 3 698879
3 598679 1 8756356 1 356396 1 65365 2 56385
2 4686 2 469674967 2 49679674 3 4686 3 47696
3 76946794 3 64868
;

/* sort by category and then by size within each category in descending order */
proc sort data=firm;
by category DESCENDING size;
run;

/* create rank variable within each category */
data firm1; set firm;
by category;
rank+1;
if first.category then rank=1;
run;

/* data set that contains 5 biggest firm within each category */
data firm2; set firm1(where=(rank<6));
run;

deleted_user · Posted 05-18-2009 10:43 AM

Thank you so much! This worked perfectly.

Selecting 5 max observations per category

Re: Selecting 5 max observations per category

Re: Selecting 5 max observations per category

Selecting 5 max observations per category

Re: Selecting 5 max observations per category

Re: Selecting 5 max observations per category

Registration is open