I have data that I sorted by Company Year and Salary Descending.
I want to extract only the top 2 salary records for each company and year to a new table.
My data looks like:
Company Year Salary
1 2020 10,000
1 2020 5,000
1 2020 3,000
1 2020 2,000
1 2019 9,000
1 2019 5,000
1 2019 3,000
2 2020 8,000
2 2020 5,000
2 2020 3,000
My desired outcome is my new table will only have the records for 10,000 and 5,000 for company 1 2020, 9,000 and 5,000 for company 1 for 2019, and 8,000 and 5,000 for company 2 in 2020
I tried the following:
proc sort data=temp.own2_for_sample3; by company Year descending salary;
run;
data temp.own2_for_sample3; set temp.own2_for_sample3;
by company Year salary;
if first.year then rnk=0;
rnk +1;
run;
It appears to work.... however I am afraid that since I'm only doing first.year and not including the first.company it could mess up. Any easy modifications?
Thank you!
data want;
do _n_=1 by 1 until(last.year);
set have;
by company year;
if _n_<=2 then output;
end;
run;
Actually your solution is neat and simple
To solve for my issue... do you think if I do If last.year OR if last.company then rnk=0?
I modified yours-
proc sort data=temp.own2_for_sample3;
by company Year descending salary;
run;
data temp.own2_for_sample3;
set temp.own2_for_sample3;
by company Year ;
if first.year then rnk=0;
rnk +1;
if rnk<=2;
run;
My worry is that I could have a situation where the company changes but not the year.
Would this fix that worry? I think it does....
if first._year or first.company then rnk=0;
Basically you could have-
1. N companies
2. You need to understand the SORT mechanics
3. when you sort by Company year , you end up having unique combinations of company and year forming its own group
4. Your descending Salary would order the salaries for each unique group of company-year.
5. Now, You want to write the 1st 2 records for each of these unique Company-year combination.
6. Your rnk counter safely increments for each record of the company-year combination
7. All you need is to apply a filter to restrict the resuts to a count of 2 or less i.e. <=2 and then you should have your results?
Does the above make sense?
You know about first.year, so consider also using lag(first.year):
proc sort data=temp.own2_for_sample3;
by company Year descending salary;
run;
data temp.own2_for_sample3;
set temp.own2_for_sample3;
by company Year ;
if first.year or lag(first.year);
rnk=first.year+2*lag(first.year); /* Editted in the "2*" factor */
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.