- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have a dataset EMP
EMPNO Salary empcode
111 4000 MGR
112 6000 MGR
114 2000 MGR
115 8000 MGR
223 2000 clerk
226 1000 clerk
228 3000 clerk
300 500 peon
333 700 peon
345 300 peon
356 200 peon
320 700 peon
I want a dataset which contain 3rd highest salary in group wise empcode
EMPNO Salary empcode
111 4000 MGR
223 2000 clerk
345 300 peon
Please help me how to write a program in sas proc sql and data sep and using sas macro.
Thanks & Regards,
Ashwini
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is a Data Step way.
PG may give you the SQL version.
data x; input EMPNO Salary empcode $; cards; 111 4000 MGR 112 6000 MGR 114 2000 MGR 115 8000 MGR 223 2000 clerk 226 1000 clerk 228 3000 clerk 300 500 peon 333 700 peon 345 300 peon 356 200 peon 320 700 peon ; run; proc sort data=x nodupkey; by empcode descending salary;run; data want; set x; by empcode; if first.empcode then n=0; n+1; if n=3 then output; run;
Ksharp
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What about if there are less than 3 representatives of a group?
Would you want the lowest salary out of 2 or the only salary represented in your dataset?
If so, no problem, modify Ksharp's code slightly
data want;
set x;
by empcode;
if first.empcode then n=0;
n+1;
if n=3
or n<3 and last.empcode
then output;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks a lot for your kind information.
Could you pls explain me how it done in proc sql and sas macro
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If a group has five people, and the top 3 all have the same salary, which one do you want to select?
I imagine the SQL gurus out there could select/remove the max, select/remove the max from the remainder, then select the max.
Why would you want to use macro language?
Good luck.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What happens if there is a tie for the 3rd highest? If you want to return both records, then you need a different solution to that proposed. One option is to run Proc Rank after the dedupe, keep the 3rd highest rank and then merge this back to the original table. Example below with modified original data (record 3 has had salary amended to 4,000).
With regards to sas macro, the only reason I can see a use for this is if you needed to change the ranking number (e.g. 4th highest instead of 3rd).
Note also that the solution from @KSharp deletes duplicate records from the source data as there is no 'out=' option!
data x;
input empno salary empcode $;
cards;
111 4000 MGR
112 6000 MGR
114 4000 MGR
115 8000 MGR
223 2000 clerk
226 1000 clerk
228 3000 clerk
300 700 peon
333 700 peon
345 300 peon
356 200 peon
320 700 peon
;
run;
proc sort data=x nodupkey out=dedupe_x;
by empcode descending salary;
run;
proc rank data=dedupe_x out=ranked_x (where=(salary_rank=3)) descending;
by empcode;
var salary;
ranks salary_rank;
run;
proc sql;
create table want as select
a.*
from x as a inner join
ranked_x as b on a.empcode=b.empcode and a.salary=b.salary;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Keith,
As you noted, there is no valid solution possible unless there is a decision made about how to handle ties. Even PROC RANK can't necessarily get around that fact, as there will be some ranking methods that produce 0 records having salary_rank=3. A double DOW would be most flexible, since it allows different counting/ranking methods:
data want;
do until (last.empcode);
set have;
by empcode descending salary;
* Various flexible counting/ranking methods are possible here to select the target salary;
end;
do until (last.empcode);
set have;
by empcode descending salary;
* Various outputting methods are available here, depending on whether tied records should be output;
end;
run;
But the OP has some decisions to make before the programming can be done.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is a hard-coding low efficient SQL approach:
data have;
input EMPNO$ Salary empcode$;
cards;
111 4000 MGR
112 6000 MGR
114 2000 MGR
115 8000 MGR
223 2000 clerk
226 1000 clerk
228 3000 clerk
300 500 peon
333 700 peon
345 300 peon
356 200 peon
320 700 peon
;
proc sql;
select * from
(select * from (select * from have group by empcode having salary ne max(salary))
group by empcode having salary ne max(salary))
group by empcode having salary eq max(salary);
quit;
Haikuo
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is a not so hard-coding SQL approach:
data have;
input EMPNO$ Salary empcode$;
cards;
111 4000 MGR
112 6000 MGR
114 2000 MGR
115 8000 MGR
223 2000 clerk
226 1000 clerk
228 3000 clerk
300 500 peon
333 700 peon
345 300 peon
356 200 peon
320 700 peon
;
proc sql;
select * from have a1 ,
(select empcode,salary from
(select distinct b.empcode,b.salary, a.salary as _salary from have a
left join have b
on a.empcode=b.empcode)
group by empcode,salary
having sum(_salary>=salary)=3) b1
where a1.empcode=b1.empcode and a1.salary=b1.salary;
quit;
Haikuo
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The SQL solution, though probably not the most efficient one, is quite elegant in my opinion:
proc sql;
select
*
from
EMP as e1
where
2 = (select
count(distinct e2.Salary)
from
EMP as e2
where
e1.empcode = e2.empcode and e2.Salary > e1.Salary);
quit;