Hello all,
I am trying to get sets of dataset from a single data set . I am wondering if you can help on this.
data have;
input name category count;
cards;
a 1 10
b 2 20
c 3 30
d 1 10
e 4 10;
run;
i am trying to get following datasets;
data one
name category count
a 1 10
b 2 20
data two
name category count
a 1 10
c 3 30
I want to create table with all possible combinations.
Thank you
data have;
input name$ category count;
cards;
a 1 10
b 2 20
c 3 30
d 1 10
e 4 10
;
run;
data _null_;
call symputx('nobs',nobs);
set have nobs=nobs;
stop;
run;
data _null_;
if _n_=1 then do;
if 0 then set have;
declare hash h(multidata:'y');
h.definekey('name', 'category', 'count');
h.definedata('name' ,'category', 'count');
h.definedone();
end;
array x[&nobs];
do i=1 to &nobs;
x(i)=i;
end;
n=&nobs;
k=2;
ncomb=comb(n,k);
do j=1 to ncomb;
call allcomb(j, k, of x
do i=1 to k;
p=x(i);
set have point=p;
rc=h.add();
end;
rc=h.output(dataset:cats('want',j));
rc=h.clear();
end;
stop;
run;
all the combination just by n=2 or all the combination literally.
data have;
input name$ category count;
cards;
a 1 10
b 2 20
c 3 30
d 1 10
e 4 10
;
run;
data _null_;
call symputx('nobs',nobs);
set have nobs=nobs;
stop;
run;
data _null_;
if _n_=1 then do;
if 0 then set have;
declare hash h(multidata:'y');
h.definekey('name', 'category', 'count');
h.definedata('name' ,'category', 'count');
h.definedone();
end;
array x[&nobs];
do i=1 to &nobs;
x(i)=i;
end;
n=&nobs;
k=2;
ncomb=comb(n,k);
do j=1 to ncomb;
call allcomb(j, k, of x
do i=1 to k;
p=x(i);
set have point=p;
rc=h.add();
end;
rc=h.output(dataset:cats('want',j));
rc=h.clear();
end;
stop;
run;
Is this a homework purpose with a clear scope of only exactly 5 rows as defined above or is this meant to be an expandable solution that can create an insanely large amount of new datasets? There are many different approaches depending on the end purpose.
One such available approach is using hash tables' output method so that you can index your data.
data _null_;
if _N_=1 then do;
set have nobs=maxobs;
declare hash myhash(multidata: 'YES');
myhash.definekey('name');
myhash.definedata('name', 'category', 'count');
myhash.definedone();
end;
do i=1 to maxobs;
do j=i+1 to maxobs;
set have point=i;
myhash.add();
set have point=j;
myhash.add();
myhash.output(dataset: catx("_", "want", i, j));
myhash.clear();
end;
end;
stop;
run;
Vincent
edit: Snap, was beat to examplifying the use of hash object output method to alleviate the burden of using macros for output dataset naming! Nice one Hai.Kuo
You need STOP;
Had already corrected it up. I'm still somewhat puzzled as to how the data step establish what will constitute it's last loop iteration. I would've assumed set statements with POINT= options would not affect data step iterations and that this would've behaved like data want; if _n_=1 then set have; end; run;
Anyway if you have any further explanation or additionnal insights to provide on this specific topic, I'll gladly welcome them. Hopefully proc DS2 eventually makes it more clear.
CAUTION: Continuous loops can occur when you use the POINT= option. When you use the POINT= option, you must include a STOP statement to stop DATA step processing, programming logic that checks for an invalid value of the POINT= variable, or both. Because POINT= reads only those observations that are specified in the DO statement, SAS cannot read an end-of-file indicator as it would if the file were being read sequentially. Because reading an end-of-file indicator ends a DATA step automatically, failure to substitute another means of ending the DATA step when you use POINT= can cause the DATA step to go into a continuous loop. If SAS reads an invalid value of the POINT= variable, it sets the automatic variable _ERROR_ to 1. Use this information to check for conditions that cause continuous DO-loop processing, or include a STOP statement at the end of the DATA step, or both.
Hello and I want to get one step more complex. I created following analysis using .
%MACRO LOOPING( START=,END=);
proc freq data=have_&START._&END;
tables name*category/ chisq cl ALL;
output out=testy_&START._&END(KEEP=_PCHI_ P_PCHI L_RROR U_RROR) pchi ALL;
exact or;
weight count;
run;
DATA FINAL_&START._&END;
SET TESTY_&START._&END;
GROUP = "&START._&END";
WHERE _PCHI_ ^=.;
RUN;
%MEND;
%LOOPING(START=1,END=2);
%LOOPING(START=1,END=3);
%LOOPING(START=1,END=4);
%LOOPING(START=1,END=5);
%LOOPING(START=1,END=6);
%LOOPING(START=1,END=7);
%LOOPING(START=1,END=8);
%LOOPING(START=1,END=9);
%LOOPING(START=2,END=3);
%LOOPING(START=2,END=4);
%LOOPING(START=2,END=5);
%LOOPING(START=2,END=6);
%LOOPING(START=2,END=7);
%LOOPING(START=2,END=8);
%LOOPING(START=2,END=9);
%LOOPING(START=3,END=4);
%LOOPING(START=3,END=5);
%LOOPING(START=3,END=6);
%LOOPING(START=3,END=7);
%LOOPING(START=3,END=8);
%LOOPING(START=3,END=9);
%LOOPING(START=4,END=5);
%LOOPING(START=4,END=6);
%LOOPING(START=4,END=7);
%LOOPING(START=4,END=8);
%LOOPING(START=4,END=9);
%LOOPING(START=5,END=6);
%LOOPING(START=5,END=7);
%LOOPING(START=5,END=8);
%LOOPING(START=5,END=9);
DATA final_want;
SET FINAL_1_2
FINAL_1_3
FINAl_1_4
FINAL_1_5
FINAL_1_6
FINAL_1_7
FINAL_1_8
FINAL_1_9
FINAL_2_3
FINAL_2_4
FINAL_2_5
FINAL_2_6
FINAL_2_7
FINAL_2_8
FINAL_2_9
FINAL_3_4
FINAL_3_5
FINAL_3_6
FINAL_3_7
FINAL_3_8
FINAL_3_9
FINAL_4_5
FINAL_4_6
FINAL_4_7
FINAL_4_8
FINAL_4_9
FINAL_5_6
FINAL_5_7
FINAL_5_8
FINAL_5_9
;
run;
It exactly gives what I want but I am wondering if I can shortened the code since it has a lot repetitive component. I am doing chi square test comparing each group and providing necessary values. I am thinking if it is possible to insert Proc freq statement within hash table creation statement so that it gives more cleaner code.
Thank you
By creating all those separate data sets in the first part of you question you have made the problem much harder. What you really needed was a BY variable that defines the "observations pairs" for exampl 1_2, 1_3, 1_4, 2_3, 2_4, 3_4 Then you just use the BY variable in the PROC FREQ.
I have to agree with data _null_ that your task would've been simplified with appropriate BY grouping.
However, if you are happy with the output and are only looking for a cosmetic fix, you can always hide the spam in another macro loop. Since your loop control seems more of a manual thing than a produce all possible subsets, you can't quite use the original intent of %loop() macro to its full extent but here's a cosmetic fix that can replace everything below %mend in your post
%macro cosmetic();
%do i=1 %to 5;
%do j=%eval(&i+1) %to 9;
%looping(start=&i, end=&j);
%end;
%end;
data final_want;
set
%do i=1 %to 5;
%do j=%eval(&i+1) %to 9;
final_&i_&j
%end;
%end;
;
run;
%mend;
%cosmetic();
macro processor is merely parsing your 50 lines of codes.
A simpler and likely better fix would've been to do the full lower or upper triangle of observation pairs, use BY processing as data _null_ pointed out and a WHERE condition to remove cases where the first of the two index is GT 5
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.