Hi, I am trying to extract top 3 minimum and maximum actual sales of each country (by group) from the dataset sashelp.prdsal2. My code is not dynamic, If any one could suggest how to incorporate changes within a single datastep, that would be nice. The program should be dynamic (if instead of top-3 we need top-n then it should work by rplacing 3 by n). the output should look like the attached screenshot. My code is -
proc sort data= sashelp.prdsal2 out=orig;
by country actual;
run;
Data stage1;
set orig;
by country;
retain min ;
if first.country=1 then min=actual ;
if last.country =1 then max=actual ;
if last.country=1;
keep country min max;
run;
Data orig_6;
set orig;
by country;
if first.country=1 then delete;
if last.country =1 then delete;
run;
data stage2;
set orig_6;
by country;
retain min ;
if first.country=1 then min=actual ;
if last.country =1 then max=actual ;
if last.country=1;
keep country min max;
run;
Data orig_12;
set orig_6;
by country;
if first.country=1 then delete;
if last.country =1 then delete;
run;
data stage3;
set orig_12;
by country;
retain min ;
if first.country=1 then min=actual ;
if last.country =1 then max=actual ;
if last.country=1;
keep country min max;
run;
Data finalstage;
set stage1 stage2 stage3;
proc sort data=finalstage ;
by country min max;
run;
Look at PROC UNIVARIATE. The second example on the documentation details how to extract these numbers and the NEXTRVAL & NEXTROBS control how many are displayed.
Heres a link to the documentation:
@avnsas wrote:
Hi, I am trying to extract top 3 minimum and maximum actual sales of each country (by group) from the dataset sashelp.prdsal2. My code is not dynamic, If any one could suggest how to incorporate changes within a single datastep, that would be nice. The program should be dynamic (if instead of top-3 we need top-n then it should work by rplacing 3 by n). the output should look like the attached screenshot. My code is -
proc sort data= sashelp.prdsal2 out=orig; by country actual; run; Data stage1; set orig; by country; retain min ; if first.country=1 then min=actual ; if last.country =1 then max=actual ; if last.country=1; keep country min max; run; Data orig_6; set orig; by country; if first.country=1 then delete; if last.country =1 then delete; run; data stage2; set orig_6; by country; retain min ; if first.country=1 then min=actual ; if last.country =1 then max=actual ; if last.country=1; keep country min max; run; Data orig_12; set orig_6; by country; if first.country=1 then delete; if last.country =1 then delete; run; data stage3; set orig_12; by country; retain min ; if first.country=1 then min=actual ; if last.country =1 then max=actual ; if last.country=1; keep country min max; run; Data finalstage; set stage1 stage2 stage3; proc sort data=finalstage ; by country min max; run;
Hi, thanks for the reply but I am looking for a solution via datastep only, no PROC FREQ / PROC SUMMARY / PROC MEANS etc. Can something be done using DO-Loop or point= ?
Why do you want to repeatedly use a hammer (datastep) when Proc Means will do this with one pass through the data (the tool that fits the job)?
@ballardw wrote:
Why do you want to repeatedly use a hammer (datastep) when Proc Means will do this with one pass through the data (the tool that fits the job)?
Because it's homework.
I agree it is much easier to do via any of the proc steps mentioned, but I was curious to know how it can be done via datastep only for the purpose of learning [and yes it is homework, you caught me].
Note: This task can be done in two steps: a sort and a data step. For extra bonus points.
Hint: you will need arrays.
Hint: consider using the remainder (MOD) operation.
Here is a data step solution - a proc and two data steps. No other Procs. I feel that the solution is useful for learning to code. The steps are:
[1] Sort the data set by Country. This step can be avoided if we are prepared to store the ACTUAL into a _temporary_ array for each
country and then use SORTN().
[2] Pre-process the sorted data set, to get the number of Countries and the number of observations per Country.
[3] In the last data step, find the starting number of _N_ for each group, save them in an array (LOW[*]) and the ending _N_ s
in another array (K[*]).
[4] Then get the 3 (dynamic number -- &topbot) values from the bottom side and top side.
[5] You get the output as you have wanted.
Better way is to say the Community what you found instead of keeping silent.
proc sort data= sashelp.prdsal2(keep = country actual) out=prdsal;
by country actual;
run;
data _null_;
length group $100;
retain group;
do num = 1 by 1 until(last.country);
set prdsal end = eof;
by country;
end;
if last.country then call catx(', ', group, num);
if eof then do;
call symputx('group', group);
ngroup = countw(group, ',');
call symputx('ngroup', ngroup);
end;
run;
%put &group;
%put &ngroup;
%let topbot = 3;
data bottop;
array k[&ngroup] _temporary_ (&group);
array low[&ngroup] _temporary_ (&group);
low[1] = 1;
do i = 2 to &ngroup;
low[i] = k[i - 1] + 1;
k[i] = k[i - 1] + k[i];
end;
do g = 1 to &ngroup ;
do i = 1 to ⌶
p = low[g] + i - 1;
set prdsal point = p;
Bot = actual;
p = k[g] - &topbot + i;
set prdsal point = p;
Top = actual;
output ;
end;
end;
stop;
drop i g actual;
run;
proc print data = Bottop;
run;
thanks for the great code, really helpful explaination !!
If you insist on doing your own programming, one simple way is like this:
%let n = 3;
proc sort data= sashelp.prdsal2 out=orig;
by country actual;
run;
data mins;
set orig; by country;
if first.country then order = 0;
order + 1;
if order <= &n;
actualMin = actual;
keep country actualMin;
run;
proc sort data=orig;
by country descending actual;
run;
data maxs;
set orig; by country;
if first.country then order = 0;
order + 1;
if order <= &n;
actualMax = actual;
keep country actualMax;
run;
data want;
merge mins maxs; by country;
run;
proc print noobs; run;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.