DATA Step, Macro, Functions and more

Selecting minimum & maximum values of each group top-n

Reply
New Contributor
Posts: 4

Selecting minimum & maximum values of each group top-n

Hi, I am trying to extract top 3 minimum and maximum actual sales of each country (by group)  from the dataset sashelp.prdsal2. My code is not dynamic, If any one could suggest how to incorporate changes within a single datastep, that would be nice. The program should be dynamic (if instead of top-3 we need top-n then it should work by rplacing 3 by n). the output should look like the attached screenshot. My code is -

 

 

 

proc sort data= sashelp.prdsal2 out=orig;
by country actual;
run;

Data stage1;
set orig;
by country;
retain min ;
if first.country=1 then min=actual ;
if last.country =1 then max=actual ;
if last.country=1;
keep country min max;
run;

Data orig_6;
set orig;
by country;
if first.country=1 then delete;
if last.country =1 then delete; 
run;

data stage2;
set orig_6;
by country;
retain min ;
if first.country=1 then min=actual ;
if last.country =1 then max=actual ;
if last.country=1;
keep country min max;
run;

Data orig_12;
set orig_6;
by country;
if first.country=1 then delete;
if last.country =1 then delete; 
run;

data stage3;
set orig_12;
by country;
retain min ;
if first.country=1 then min=actual ;
if last.country =1 then max=actual ;
if last.country=1;
keep country min max;
run;

Data finalstage;
set stage1 stage2 stage3;
proc sort data=finalstage ;
by country min max;
run;

output.png

Super User
Posts: 19,815

Re: Selecting minimum & maximum values of each group top-n

[ Edited ]

Look at PROC UNIVARIATE. The second example on the documentation details how to extract these numbers and the NEXTRVAL & NEXTROBS control how many are displayed. 

 

Heres a link to the documentation:

http://support.sas.com/documentation/cdl/en/procstat/66703/HTML/default/viewer.htm#procstat_univaria...

 


avnsas wrote:

Hi, I am trying to extract top 3 minimum and maximum actual sales of each country (by group)  from the dataset sashelp.prdsal2. My code is not dynamic, If any one could suggest how to incorporate changes within a single datastep, that would be nice. The program should be dynamic (if instead of top-3 we need top-n then it should work by rplacing 3 by n). the output should look like the attached screenshot. My code is -

 

 

 

proc sort data= sashelp.prdsal2 out=orig;
by country actual;
run;

Data stage1;
set orig;
by country;
retain min ;
if first.country=1 then min=actual ;
if last.country =1 then max=actual ;
if last.country=1;
keep country min max;
run;

Data orig_6;
set orig;
by country;
if first.country=1 then delete;
if last.country =1 then delete; 
run;

data stage2;
set orig_6;
by country;
retain min ;
if first.country=1 then min=actual ;
if last.country =1 then max=actual ;
if last.country=1;
keep country min max;
run;

Data orig_12;
set orig_6;
by country;
if first.country=1 then delete;
if last.country =1 then delete; 
run;

data stage3;
set orig_12;
by country;
retain min ;
if first.country=1 then min=actual ;
if last.country =1 then max=actual ;
if last.country=1;
keep country min max;
run;

Data finalstage;
set stage1 stage2 stage3;
proc sort data=finalstage ;
by country min max;
run;

output.png


 

New Contributor
Posts: 4

Re: Selecting minimum & maximum values of each group top-n

Hi, thanks for the reply but I am looking for a solution via datastep only, no PROC FREQ / PROC SUMMARY / PROC MEANS etc. Can something be done using DO-Loop  or point=  ? 

Super User
Posts: 11,343

Re: Selecting minimum & maximum values of each group top-n

Why do you want to repeatedly use a hammer (datastep) when Proc Means will do this with one pass through the data (the tool that fits the job)?

Super User
Posts: 19,815

Re: Selecting minimum & maximum values of each group top-n


ballardw wrote:

Why do you want to repeatedly use a hammer (datastep) when Proc Means will do this with one pass through the data (the tool that fits the job)?


Because it's homework. 

New Contributor
Posts: 4

Re: Selecting minimum & maximum values of each group top-n

I agree it is much easier to do via any of the proc steps mentioned, but I was curious to know how it can be done via datastep only for the purpose of learning [and yes it is homework, you caught me].

Respected Advisor
Posts: 4,925

Re: Selecting minimum & maximum values of each group top-n

[ Edited ]

Note: This task can be done in two steps: a sort and a data step. For extra bonus points.

 

Hint: you will need arrays.

Hint: consider using the remainder (MOD) operation.

PG
Super Contributor
Posts: 298

Re: Selecting minimum & maximum values of each group top-n

Here is a data step solution - a proc and two data steps. No other Procs. I feel that the solution is useful for learning to code. The steps are:

[1] Sort the data set by Country. This step can be avoided if we are prepared to store the ACTUAL into a _temporary_ array for each

country and then use SORTN().

 

[2] Pre-process the sorted data set, to get the number of Countries and the number of observations per Country.

 

[3] In the last data step, find the starting number of _N_ for each group, save them in an array (LOW[*]) and the ending _N_ s

in another array (K[*]).

 

[4] Then get the 3 (dynamic number -- &topbot) values from the bottom side and top side.

 

[5] You get the output as you have wanted.

 

 

Better way is to say the Community what you found instead of keeping silent.

proc sort data= sashelp.prdsal2(keep = country actual) out=prdsal;
by country actual;
run;


data _null_;
   length group $100;
   retain group;
   do num = 1 by 1 until(last.country);
      set prdsal end = eof;
      by country;
   end;
   if last.country then  call catx(', ', group, num);
   if eof then do;
      call symputx('group', group);
      ngroup = countw(group, ',');
      call symputx('ngroup', ngroup);
   end;
run;

%put &group;

%put &ngroup;

%let topbot = 3;

data bottop;
   array k[&ngroup] _temporary_ (&group);
   array low[&ngroup] _temporary_ (&group);
   low[1] = 1;
   do i = 2 to &ngroup;
      low[i] = k[i - 1] + 1;
      k[i] = k[i - 1] + k[i];
   end;

   do g = 1 to &ngroup ;
      do i = 1 to ⌶
         p = low[g] + i - 1;
         set prdsal point = p;
         Bot = actual;
         p = k[g] - &topbot + i;
         set prdsal point = p;
         Top = actual;
         output ;
      end;
   end;
stop;
drop i g actual;
run;      
      
proc print data = Bottop;
run;
New Contributor
Posts: 4

Re: Selecting minimum & maximum values of each group top-n

thanks for the great code, really helpful explaination !!

Respected Advisor
Posts: 4,925

Re: Selecting minimum & maximum values of each group top-n

If you insist on doing your own programming, one simple way is like this:

 

%let n = 3;

proc sort data= sashelp.prdsal2 out=orig;
by country actual;
run;

data mins;
set orig; by country;
if first.country then order = 0;
order + 1;
if order <= &n;
actualMin = actual;
keep country actualMin;
run;

proc sort data=orig;
by country descending actual;
run;

data maxs;
set orig; by country;
if first.country then order = 0;
order + 1;
if order <= &n;
actualMax = actual;
keep country actualMax;
run;

data want;
merge mins maxs; by country;
run;

proc print noobs; run;

PG
Ask a Question
Discussion stats
  • 9 replies
  • 231 views
  • 4 likes
  • 5 in conversation