How to find Nth highest salary using Data Step function?
proc sort data=have;
by Salary;
run;
data highestNthSalary;
set have firstobs=N obs=N;
run;
data highestSalaries;
set have;
if _n_=1 then counter=1;
else counter+1;
if counter < N;
run;
If all the values are on a single record use the Largest function:
The transpose is to get all the values of a variable on a single record. The default names will be Col1, Col2, ... ,Col19 (19 records in the set).
Use the "of Col:" to have all variables in a list to use in the function call in this case.
proc transpose data=sashelp.class out=trans; var weight; run; data want; set trans; /* replace 5 with n of interest*/ v = largest(5, of col:); keep v; run;
More of an "exercise for the interested reader" but may be easier if the N is to be per some combination of other variables. Then sort and transpose by those may be simpler than trying to code through a bunch of counters.
@animesh123 wrote:
How to find Nth highest salary using Data Step function?
Why insist on doing it in a DATA step? Why not use the easiest method, which is PROC RANK, and this also has features for handling ties, which the data step method does not have (unless you do additional programming).
Typical SAS interview question one comes across.
While there could be better and more elegant ways of doing it, a simple solution for interview purpose is given below.
I am using second highest. That should serve the purpose.
Change as needed.
data pay;
input id name $ salary;
datalines;
101 Sonu 120000
102 Monu 150000
104 Tanu 100000
105 Manu 101000
run;
proc sort data=pay;
by descending Salary ;
run;
data _null_;
set pay;
if _n_=2 then do;
call symput("Second_highest_salary",Salary);
end;
run;
%put &=Second_highest_salary;
The log has the value of second highest salary
Thanks for pointing out the case of ties. Updated the code to take care of the ties.
The question is about the salaries that can be taken care of by removing duplicate values.
The code and data are self explanatory.
I repeat what I have said earlier, there may be more elegant and better ways to solve this question, but to me this appears to be a simple approach..
%let nth=2;
data pay;
input id name $ salary;
datalines;
100 Teetu 150000
101 Sonu 120000
102 Monu 150000
104 Tanu 100000
105 Manu 101000
106 Bablu 150000
run;
proc sort data=pay nodupkey;
by descending Salary ;
run;
data _null_;
set pay;
if _n_= &nth.then do;
call symput("Nth_highest_salary",Salary);
end;
run;
%put &=Nth_highest_salary where &=nth.;
The log will have the
data want;
if _N_ = 1 then do;
declare hash h (dataset : 'sashelp.baseball(obs=0)', ordered : 'd', multidata : 'Y');
h.definekey('Salary');
h.definedata(all : 'y');
h.definedone();
declare hiter i('h');
end;
do until (z);
set sashelp.baseball end = z;
h.add();
end;
do _N_ = 1 by 1 while (i.next() = 0 & _N_ <= 5);
output;
end;
_N_ = i.first();
_N_ = i.prev();
h.clear();
run;
data have;
set sashelp.heart(keep=weight rename=(weight=salary));
run;
%let n=4;
data _null_;
set have end=last;
array x{999999} _temporary_;
x{_n_}=salary;
if last then do;
largest_&n=largest(&n,of x{*}) ;
put largest_&n=;
end;
run;
%let n=4;
data _null_;
array x[&n] _temporary_ (&n*0);
retain m 0;
set have end=last;
if salary>m then do;
x[whichn(m, of x[*])]=salary;
m=min(of x[*]);
end;
if last then put "&n.th largest = " m;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.