BookmarkSubscribeRSS Feed
anweinbe
Quartz | Level 8

I have data that I sorted by Company Year and Salary Descending.

I want to extract only the top 2 salary records for each company and year to a new table.

 

My data looks like:

Company     Year         Salary

1                 2020          10,000

1                 2020           5,000

1                 2020           3,000

1                 2020           2,000

1                 2019           9,000

1                 2019           5,000

1                 2019           3,000

2                 2020           8,000

2                 2020           5,000

2                 2020           3,000

 

My desired outcome is my new table will only have the records for 10,000 and 5,000 for company 1 2020, 9,000 and 5,000 for company 1 for 2019, and 8,000 and 5,000 for company 2 in 2020

 

I tried the following:

proc sort data=temp.own2_for_sample3; by company Year descending salary;
run;

data temp.own2_for_sample3; set temp.own2_for_sample3;
by company Year salary;
if first.year then rnk=0;
rnk +1;
run;

 

It appears to work.... however I am afraid that since I'm only doing first.year and not including the first.company it could mess up. Any easy modifications?

 

Thank you!

7 REPLIES 7
novinosrin
Tourmaline | Level 20
data want;
 do _n_=1 by 1 until(last.year);
  set have;
  by company  year;
  if _n_<=2 then output;
 end;
run;
novinosrin
Tourmaline | Level 20

Actually your solution is neat and simple

anweinbe
Quartz | Level 8

To solve for my issue... do you think if I do If last.year OR if last.company then rnk=0?

novinosrin
Tourmaline | Level 20

 I modified yours-

proc sort data=temp.own2_for_sample3; 
by company Year descending salary;
run;

data temp.own2_for_sample3; 
set temp.own2_for_sample3;
by company Year ;
if first.year then rnk=0;
rnk +1;
if rnk<=2;
run;
anweinbe
Quartz | Level 8

My worry is that I could have a situation where the company changes but not the year.

 

Would this fix that worry? I think it does....

if first._year or first.company then rnk=0;

 

 

novinosrin
Tourmaline | Level 20

Basically you could have-

 

1. N companies

2. You need to understand the SORT mechanics

3. when you sort by Company year , you end up having unique combinations of company and year forming its own group

4. Your descending Salary would order the salaries for each unique group of company-year.

5. Now, You want to write the 1st 2 records for each of these unique Company-year combination.

6. Your rnk counter safely increments for each record of the company-year combination

7. All you need is to apply a filter to restrict the resuts to a count of 2 or less i.e. <=2 and then you should have your results?

Does the above make sense?

mkeintz
PROC Star

You know about first.year, so consider also using lag(first.year):

 

proc sort data=temp.own2_for_sample3;
  by company Year descending salary;
run;

data temp.own2_for_sample3; 
   set temp.own2_for_sample3;
   by company Year ;
   if first.year or lag(first.year);
   rnk=first.year+2*lag(first.year);  /* Editted in the "2*" factor */
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 879 views
  • 0 likes
  • 3 in conversation