BookmarkSubscribeRSS Feed
MSK4
Obsidian | Level 7

Suppose I having a table it contains Lakhs of records. I have a variable called salary. I want 10th max  and 10th min salary. Salary column having duplicate values. Should not sort the dataset twice. 

8 REPLIES 8
Ksharp
Super User

You just need value of salary ?

 

data have;
 set sashelp.class;
run;

proc univariate data=have nextrobs=2 nextrval=2 ;
var weight;
run;

Kurt_Bremser
Super User
Sort once. Then, in a data step, use the NOBS= option to retrieve the number of observations, and keep all observations (subsetting IF) where _N_ le 10 or _N_ ge nobs - 9.
PaigeMiller
Diamond | Level 26

Since the salary columns has duplicate values, neither of the above handles ties properly.

 

You need PROC RANK, and then you can choose how to handle ties.

--
Paige Miller
Ksharp
Super User
nextrval=2
can handle ties.
PaigeMiller
Diamond | Level 26

@Ksharp wrote:
nextrval=2
can handle ties.

Yes, it will tell you if there are more than one of a particular value, but that's not the same as finding the 10th lowest or 10th largest value.

 

PROC RANK also gives you options on what to do when the 10th lowest or 10th highest is tied.

--
Paige Miller
RichardDeVen
Barite | Level 11

A hash can easily handle the distinct values amongst 100,000s of salary values.

 

Read the salaries into an ordered hash and output the data from the first 10 and last 10 keys.

 

Example:

 

data have;
  do id = 1 to 500000;
    salary = rand('integer', 1e5, 1e6);
    output;
  end;
run;

data salary_bookends(keep=salary index);
  declare hash salaries(dataset:'have(keep=salary)', ordered: 'A');
  salaries.defineKey('salary');
  salaries.defineDone();

  call missing(salary);

  declare hiter hi('salaries');

  top = salaries.num_items;
  do index = 1 by 1 while (hi.next()=0);
    if index < 11 or index > top-10 then output;
  end;

run;

Result

RichardADeVenezia_1-1606050092852.png

 

 

PGStats
Opal | Level 21

Or rather...

 

data have;
  do id = 1 to 500000;
    /* Lots of duplicates */
    salary = rand('integer', 1, 100);
    output;
  end;
run;

data salary_bookends(keep=salary kind index);
  declare hash salaries(dataset:'have(keep=salary)', ordered: 'A');
  salaries.defineKey('salary');
  salaries.defineDone();

  call missing(salary);

  declare hiter hi('salaries');

  kind = "Min";
  hi.first();
  do index = 1 to 10 until (hi.next());
    output;
  end;
  
  kind = "Max";
  hi.last();
  do index = 1 to 10 until (hi.prev());
    output;
  end;

run;

proc print noobs data=salary_bookends; run;

image.png

PG
PeterClemmensen
Tourmaline | Level 20

Do you want the 10 min/max salary values? Or the 10'th min/max salary values? If it is the first, take one of the solutions above. Here is the hash version for the latter.

 

data have;
  do id = 1 to 500000;
    salary = ceil(rand('uniform') * 100);
    output;
  end;
run;

data want;
   dcl hash h (dataset : 'have', ordered : 'Y');
   h.definekey('salary');
   h.definedata(all : 'Y');
   h.definedone();
   dcl hiter i ('h');

   if 0 then set have;

   do _N_ = 1 by 1 while (i.next() = 0);
      if _N_ = 10 then leave;
   end;
   output;

   _N_ = i.last();

   do _N_ = 1 by 1 while (i.prev() = 0);
      if _N_ = 9 then leave;
   end;
   output;
run;

 

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 820 views
  • 2 likes
  • 7 in conversation