BookmarkSubscribeRSS Feed
MSK4
Obsidian | Level 7

Suppose I having a table it contains Lakhs of records. I have a variable called salary. I want 10th max  and 10th min salary. Salary column having duplicate values. Should not sort the dataset twice. 

8 REPLIES 8
Ksharp
Super User

You just need value of salary ?

 

data have;
 set sashelp.class;
run;

proc univariate data=have nextrobs=2 nextrval=2 ;
var weight;
run;

Kurt_Bremser
Super User
Sort once. Then, in a data step, use the NOBS= option to retrieve the number of observations, and keep all observations (subsetting IF) where _N_ le 10 or _N_ ge nobs - 9.
PaigeMiller
Diamond | Level 26

Since the salary columns has duplicate values, neither of the above handles ties properly.

 

You need PROC RANK, and then you can choose how to handle ties.

--
Paige Miller
Ksharp
Super User
nextrval=2
can handle ties.
PaigeMiller
Diamond | Level 26

@Ksharp wrote:
nextrval=2
can handle ties.

Yes, it will tell you if there are more than one of a particular value, but that's not the same as finding the 10th lowest or 10th largest value.

 

PROC RANK also gives you options on what to do when the 10th lowest or 10th highest is tied.

--
Paige Miller
RichardDeVen
Barite | Level 11

A hash can easily handle the distinct values amongst 100,000s of salary values.

 

Read the salaries into an ordered hash and output the data from the first 10 and last 10 keys.

 

Example:

 

data have;
  do id = 1 to 500000;
    salary = rand('integer', 1e5, 1e6);
    output;
  end;
run;

data salary_bookends(keep=salary index);
  declare hash salaries(dataset:'have(keep=salary)', ordered: 'A');
  salaries.defineKey('salary');
  salaries.defineDone();

  call missing(salary);

  declare hiter hi('salaries');

  top = salaries.num_items;
  do index = 1 by 1 while (hi.next()=0);
    if index < 11 or index > top-10 then output;
  end;

run;

Result

RichardADeVenezia_1-1606050092852.png

 

 

PGStats
Opal | Level 21

Or rather...

 

data have;
  do id = 1 to 500000;
    /* Lots of duplicates */
    salary = rand('integer', 1, 100);
    output;
  end;
run;

data salary_bookends(keep=salary kind index);
  declare hash salaries(dataset:'have(keep=salary)', ordered: 'A');
  salaries.defineKey('salary');
  salaries.defineDone();

  call missing(salary);

  declare hiter hi('salaries');

  kind = "Min";
  hi.first();
  do index = 1 to 10 until (hi.next());
    output;
  end;
  
  kind = "Max";
  hi.last();
  do index = 1 to 10 until (hi.prev());
    output;
  end;

run;

proc print noobs data=salary_bookends; run;

image.png

PG
PeterClemmensen
Tourmaline | Level 20

Do you want the 10 min/max salary values? Or the 10'th min/max salary values? If it is the first, take one of the solutions above. Here is the hash version for the latter.

 

data have;
  do id = 1 to 500000;
    salary = ceil(rand('uniform') * 100);
    output;
  end;
run;

data want;
   dcl hash h (dataset : 'have', ordered : 'Y');
   h.definekey('salary');
   h.definedata(all : 'Y');
   h.definedone();
   dcl hiter i ('h');

   if 0 then set have;

   do _N_ = 1 by 1 while (i.next() = 0);
      if _N_ = 10 then leave;
   end;
   output;

   _N_ = i.last();

   do _N_ = 1 by 1 while (i.prev() = 0);
      if _N_ = 9 then leave;
   end;
   output;
run;

 

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1037 views
  • 2 likes
  • 7 in conversation