Hi, I am trying to extract top 3 minimum and maximum actual sales of each country (by group) from the dataset sashelp.prdsal2. My code is not dynamic, If any one could suggest how to incorporate changes within a single datastep, that would be nice. The program should be dynamic (if instead of top-3 we need top-n then it should work by rplacing 3 by n). the output should look like the attached screenshot. My code is -
proc sort data= sashelp.prdsal2 out=orig;
by country actual;
run;
Data stage1;
set orig;
by country;
retain min ;
if first.country=1 then min=actual ;
if last.country =1 then max=actual ;
if last.country=1;
keep country min max;
run;
Data orig_6;
set orig;
by country;
if first.country=1 then delete;
if last.country =1 then delete;
run;
data stage2;
set orig_6;
by country;
retain min ;
if first.country=1 then min=actual ;
if last.country =1 then max=actual ;
if last.country=1;
keep country min max;
run;
Data orig_12;
set orig_6;
by country;
if first.country=1 then delete;
if last.country =1 then delete;
run;
data stage3;
set orig_12;
by country;
retain min ;
if first.country=1 then min=actual ;
if last.country =1 then max=actual ;
if last.country=1;
keep country min max;
run;
Data finalstage;
set stage1 stage2 stage3;
proc sort data=finalstage ;
by country min max;
run;
Look at PROC UNIVARIATE. The second example on the documentation details how to extract these numbers and the NEXTRVAL & NEXTROBS control how many are displayed.
Heres a link to the documentation:
@avnsas wrote:
Hi, I am trying to extract top 3 minimum and maximum actual sales of each country (by group) from the dataset sashelp.prdsal2. My code is not dynamic, If any one could suggest how to incorporate changes within a single datastep, that would be nice. The program should be dynamic (if instead of top-3 we need top-n then it should work by rplacing 3 by n). the output should look like the attached screenshot. My code is -
proc sort data= sashelp.prdsal2 out=orig; by country actual; run; Data stage1; set orig; by country; retain min ; if first.country=1 then min=actual ; if last.country =1 then max=actual ; if last.country=1; keep country min max; run; Data orig_6; set orig; by country; if first.country=1 then delete; if last.country =1 then delete; run; data stage2; set orig_6; by country; retain min ; if first.country=1 then min=actual ; if last.country =1 then max=actual ; if last.country=1; keep country min max; run; Data orig_12; set orig_6; by country; if first.country=1 then delete; if last.country =1 then delete; run; data stage3; set orig_12; by country; retain min ; if first.country=1 then min=actual ; if last.country =1 then max=actual ; if last.country=1; keep country min max; run; Data finalstage; set stage1 stage2 stage3; proc sort data=finalstage ; by country min max; run;
Hi, thanks for the reply but I am looking for a solution via datastep only, no PROC FREQ / PROC SUMMARY / PROC MEANS etc. Can something be done using DO-Loop or point= ?
Why do you want to repeatedly use a hammer (datastep) when Proc Means will do this with one pass through the data (the tool that fits the job)?
@ballardw wrote:
Why do you want to repeatedly use a hammer (datastep) when Proc Means will do this with one pass through the data (the tool that fits the job)?
Because it's homework.
I agree it is much easier to do via any of the proc steps mentioned, but I was curious to know how it can be done via datastep only for the purpose of learning [and yes it is homework, you caught me].
Note: This task can be done in two steps: a sort and a data step. For extra bonus points.
Hint: you will need arrays.
Hint: consider using the remainder (MOD) operation.
Here is a data step solution - a proc and two data steps. No other Procs. I feel that the solution is useful for learning to code. The steps are:
[1] Sort the data set by Country. This step can be avoided if we are prepared to store the ACTUAL into a _temporary_ array for each
country and then use SORTN().
[2] Pre-process the sorted data set, to get the number of Countries and the number of observations per Country.
[3] In the last data step, find the starting number of _N_ for each group, save them in an array (LOW[*]) and the ending _N_ s
in another array (K[*]).
[4] Then get the 3 (dynamic number -- &topbot) values from the bottom side and top side.
[5] You get the output as you have wanted.
Better way is to say the Community what you found instead of keeping silent.
proc sort data= sashelp.prdsal2(keep = country actual) out=prdsal;
by country actual;
run;
data _null_;
length group $100;
retain group;
do num = 1 by 1 until(last.country);
set prdsal end = eof;
by country;
end;
if last.country then call catx(', ', group, num);
if eof then do;
call symputx('group', group);
ngroup = countw(group, ',');
call symputx('ngroup', ngroup);
end;
run;
%put &group;
%put &ngroup;
%let topbot = 3;
data bottop;
array k[&ngroup] _temporary_ (&group);
array low[&ngroup] _temporary_ (&group);
low[1] = 1;
do i = 2 to &ngroup;
low[i] = k[i - 1] + 1;
k[i] = k[i - 1] + k[i];
end;
do g = 1 to &ngroup ;
do i = 1 to ⌶
p = low[g] + i - 1;
set prdsal point = p;
Bot = actual;
p = k[g] - &topbot + i;
set prdsal point = p;
Top = actual;
output ;
end;
end;
stop;
drop i g actual;
run;
proc print data = Bottop;
run;
thanks for the great code, really helpful explaination !!
If you insist on doing your own programming, one simple way is like this:
%let n = 3;
proc sort data= sashelp.prdsal2 out=orig;
by country actual;
run;
data mins;
set orig; by country;
if first.country then order = 0;
order + 1;
if order <= &n;
actualMin = actual;
keep country actualMin;
run;
proc sort data=orig;
by country descending actual;
run;
data maxs;
set orig; by country;
if first.country then order = 0;
order + 1;
if order <= &n;
actualMax = actual;
keep country actualMax;
run;
data want;
merge mins maxs; by country;
run;
proc print noobs; run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.