Suppose I having a table it contains Lakhs of records. I have a variable called salary. I want 10th max and 10th min salary. Salary column having duplicate values. Should not sort the dataset twice.
You just need value of salary ?
data have; set sashelp.class; run; proc univariate data=have nextrobs=2 nextrval=2 ; var weight; run;
Since the salary columns has duplicate values, neither of the above handles ties properly.
You need PROC RANK, and then you can choose how to handle ties.
@Ksharp wrote:
nextrval=2
can handle ties.
Yes, it will tell you if there are more than one of a particular value, but that's not the same as finding the 10th lowest or 10th largest value.
PROC RANK also gives you options on what to do when the 10th lowest or 10th highest is tied.
A hash can easily handle the distinct values amongst 100,000s of salary values.
Read the salaries into an ordered hash and output the data from the first 10 and last 10 keys.
Example:
data have; do id = 1 to 500000; salary = rand('integer', 1e5, 1e6); output; end; run; data salary_bookends(keep=salary index); declare hash salaries(dataset:'have(keep=salary)', ordered: 'A'); salaries.defineKey('salary'); salaries.defineDone(); call missing(salary); declare hiter hi('salaries'); top = salaries.num_items; do index = 1 by 1 while (hi.next()=0); if index < 11 or index > top-10 then output; end; run;
Result
Or rather...
data have;
do id = 1 to 500000;
/* Lots of duplicates */
salary = rand('integer', 1, 100);
output;
end;
run;
data salary_bookends(keep=salary kind index);
declare hash salaries(dataset:'have(keep=salary)', ordered: 'A');
salaries.defineKey('salary');
salaries.defineDone();
call missing(salary);
declare hiter hi('salaries');
kind = "Min";
hi.first();
do index = 1 to 10 until (hi.next());
output;
end;
kind = "Max";
hi.last();
do index = 1 to 10 until (hi.prev());
output;
end;
run;
proc print noobs data=salary_bookends; run;
Do you want the 10 min/max salary values? Or the 10'th min/max salary values? If it is the first, take one of the solutions above. Here is the hash version for the latter.
data have;
do id = 1 to 500000;
salary = ceil(rand('uniform') * 100);
output;
end;
run;
data want;
dcl hash h (dataset : 'have', ordered : 'Y');
h.definekey('salary');
h.definedata(all : 'Y');
h.definedone();
dcl hiter i ('h');
if 0 then set have;
do _N_ = 1 by 1 while (i.next() = 0);
if _N_ = 10 then leave;
end;
output;
_N_ = i.last();
do _N_ = 1 by 1 while (i.prev() = 0);
if _N_ = 9 then leave;
end;
output;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.