- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
How to find Nth highest salary using Data Step function?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
proc sort data=have;
by Salary;
run;
data highestNthSalary;
set have firstobs=N obs=N;
run;
data highestSalaries;
set have;
if _n_=1 then counter=1;
else counter+1;
if counter < N;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If all the values are on a single record use the Largest function:
The transpose is to get all the values of a variable on a single record. The default names will be Col1, Col2, ... ,Col19 (19 records in the set).
Use the "of Col:" to have all variables in a list to use in the function call in this case.
proc transpose data=sashelp.class out=trans; var weight; run; data want; set trans; /* replace 5 with n of interest*/ v = largest(5, of col:); keep v; run;
More of an "exercise for the interested reader" but may be easier if the N is to be per some combination of other variables. Then sort and transpose by those may be simpler than trying to code through a bunch of counters.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@animesh123 wrote:
How to find Nth highest salary using Data Step function?
Why insist on doing it in a DATA step? Why not use the easiest method, which is PROC RANK, and this also has features for handling ties, which the data step method does not have (unless you do additional programming).
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Typical SAS interview question one comes across.
While there could be better and more elegant ways of doing it, a simple solution for interview purpose is given below.
I am using second highest. That should serve the purpose.
Change as needed.
data pay;
input id name $ salary;
datalines;
101 Sonu 120000
102 Monu 150000
104 Tanu 100000
105 Manu 101000
run;
proc sort data=pay;
by descending Salary ;
run;
data _null_;
set pay;
if _n_=2 then do;
call symput("Second_highest_salary",Salary);
end;
run;
%put &=Second_highest_salary;
The log has the value of second highest salary
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for pointing out the case of ties. Updated the code to take care of the ties.
The question is about the salaries that can be taken care of by removing duplicate values.
The code and data are self explanatory.
I repeat what I have said earlier, there may be more elegant and better ways to solve this question, but to me this appears to be a simple approach..
%let nth=2;
data pay;
input id name $ salary;
datalines;
100 Teetu 150000
101 Sonu 120000
102 Monu 150000
104 Tanu 100000
105 Manu 101000
106 Bablu 150000
run;
proc sort data=pay nodupkey;
by descending Salary ;
run;
data _null_;
set pay;
if _n_= &nth.then do;
call symput("Nth_highest_salary",Salary);
end;
run;
%put &=Nth_highest_salary where &=nth.;
The log will have the
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data want;
if _N_ = 1 then do;
declare hash h (dataset : 'sashelp.baseball(obs=0)', ordered : 'd', multidata : 'Y');
h.definekey('Salary');
h.definedata(all : 'y');
h.definedone();
declare hiter i('h');
end;
do until (z);
set sashelp.baseball end = z;
h.add();
end;
do _N_ = 1 by 1 while (i.next() = 0 & _N_ <= 5);
output;
end;
_N_ = i.first();
_N_ = i.prev();
h.clear();
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
set sashelp.heart(keep=weight rename=(weight=salary));
run;
%let n=4;
data _null_;
set have end=last;
array x{999999} _temporary_;
x{_n_}=salary;
if last then do;
largest_&n=largest(&n,of x{*}) ;
put largest_&n=;
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
%let n=4;
data _null_;
array x[&n] _temporary_ (&n*0);
retain m 0;
set have end=last;
if salary>m then do;
x[whichn(m, of x[*])]=salary;
m=min(of x[*]);
end;
if last then put "&n.th largest = " m;
run;