BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
learner_sas
Quartz | Level 8

Hello all,

I am trying to get sets of dataset from a single data set . I am wondering if you can help on this.

data have;

input name category count;

cards;

a   1  10

b   2   20

c   3   30

d   1   10

e   4    10;

run;

i am trying to get following datasets;

data one

name category count

a 1 10

b 2 20

data two

name category count

a 1 10

c  3 30

I want to create table with all possible combinations.

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

data have;

input name$ category count;

cards;

a   1  10

b   2   20

c   3   30

d   1   10

e   4    10

;

run;

data _null_;

call symputx('nobs',nobs);

set have nobs=nobs;

stop;

run;

data _null_;

if _n_=1 then do;

   if 0 then set have;

   declare hash h(multidata:'y');

   h.definekey('name', 'category', 'count');

   h.definedata('name' ,'category', 'count');

   h.definedone();

end;

   array x[&nobs];

   do i=1 to &nobs;

     x(i)=i;

   end;

   n=&nobs;

   k=2;

   ncomb=comb(n,k);

   do j=1 to ncomb;

      call allcomb(j, k, of x

  • );
  •       do i=1 to k;

              p=x(i);

            set have point=p;

            rc=h.add();

          end;

          rc=h.output(dataset:cats('want',j));

          rc=h.clear();

       end;

       stop;

    run;

    View solution in original post

    11 REPLIES 11
    Haikuo
    Onyx | Level 15

    all the combination just by n=2 or all the combination literally.

    learner_sas
    Quartz | Level 8

    Hello it is n=2 combinations only.

    Thank you

    Haikuo
    Onyx | Level 15

    data have;

    input name$ category count;

    cards;

    a   1  10

    b   2   20

    c   3   30

    d   1   10

    e   4    10

    ;

    run;

    data _null_;

    call symputx('nobs',nobs);

    set have nobs=nobs;

    stop;

    run;

    data _null_;

    if _n_=1 then do;

       if 0 then set have;

       declare hash h(multidata:'y');

       h.definekey('name', 'category', 'count');

       h.definedata('name' ,'category', 'count');

       h.definedone();

    end;

       array x[&nobs];

       do i=1 to &nobs;

         x(i)=i;

       end;

       n=&nobs;

       k=2;

       ncomb=comb(n,k);

       do j=1 to ncomb;

          call allcomb(j, k, of x

  • );
  •       do i=1 to k;

              p=x(i);

            set have point=p;

            rc=h.add();

          end;

          rc=h.output(dataset:cats('want',j));

          rc=h.clear();

       end;

       stop;

    run;

    Vince28_Statcan
    Quartz | Level 8

    Is this a homework purpose with a clear scope of only exactly 5 rows as defined above or is this meant to be an expandable solution that can create an insanely large amount of new datasets? There are many different approaches depending on the end purpose.

    One such available approach is using hash tables' output method so that you can index your data.

    data _null_;

         if _N_=1 then do;

              set have nobs=maxobs;

              declare hash myhash(multidata: 'YES');

              myhash.definekey('name');

              myhash.definedata('name', 'category', 'count');

              myhash.definedone();

         end;

         do i=1 to maxobs;

              do j=i+1 to maxobs;

                    set have point=i;

                   myhash.add();

                   set have point=j;

                   myhash.add();

                   myhash.output(dataset: catx("_", "want", i, j));

                   myhash.clear();

              end;

         end;

         stop;

    run;

    Vincent

    edit: Snap, was beat to examplifying the use of hash object output method to alleviate the burden of using macros for output dataset naming! Nice one Hai.Kuo

    Vince28_Statcan
    Quartz | Level 8

    Had already corrected it up. I'm still somewhat puzzled as to how the data step establish what will constitute it's last loop iteration. I would've assumed set statements with POINT= options would not affect data step iterations and that this would've behaved like data want; if _n_=1 then set have; end; run;

    Anyway if you have any further explanation or additionnal insights to provide on this specific topic, I'll gladly welcome them. Hopefully proc DS2 eventually makes it more clear.

    data_null__
    Jade | Level 19

    CAUTION: Continuous loops can occur when you use the POINT= option. When you use the POINT= option, you must include a STOP statement to stop DATA step processing, programming logic that checks for an invalid value of the POINT= variable, or both. Because POINT= reads only those observations that are specified in the DO statement, SAS cannot read an end-of-file indicator as it would if the file were being read sequentially. Because reading an end-of-file indicator ends a DATA step automatically, failure to substitute another means of ending the DATA step when you use POINT= can cause the DATA step to go into a continuous loop. If SAS reads an invalid value of the POINT= variable, it sets the automatic variable _ERROR_ to 1. Use this information to check for conditions that cause continuous DO-loop processing, or include a STOP statement at the end of the DATA step, or both.

    learner_sas
    Quartz | Level 8

    Thank you so much and everybody. I am a learning SAS everyday and this is the most amazing application I have ever seen.

    learner_sas
    Quartz | Level 8

    Hello and   I want to get one step more complex. I created following analysis using .

    %MACRO LOOPING( START=,END=);

    proc freq data=have_&START._&END;

      tables name*category/ chisq cl ALL;

      output out=testy_&START._&END(KEEP=_PCHI_ P_PCHI L_RROR U_RROR) pchi ALL;

      exact or;

      weight count;

    run;

    DATA FINAL_&START._&END;

      SET TESTY_&START._&END;

    GROUP = "&START._&END";

    WHERE _PCHI_ ^=.;

    RUN;

    %MEND;

    %LOOPING(START=1,END=2);

    %LOOPING(START=1,END=3);

    %LOOPING(START=1,END=4);

    %LOOPING(START=1,END=5);

    %LOOPING(START=1,END=6);

    %LOOPING(START=1,END=7);

    %LOOPING(START=1,END=8);

    %LOOPING(START=1,END=9);

    %LOOPING(START=2,END=3);

    %LOOPING(START=2,END=4);

    %LOOPING(START=2,END=5);

    %LOOPING(START=2,END=6);

    %LOOPING(START=2,END=7);

    %LOOPING(START=2,END=8);

    %LOOPING(START=2,END=9);

    %LOOPING(START=3,END=4);

    %LOOPING(START=3,END=5);

    %LOOPING(START=3,END=6);

    %LOOPING(START=3,END=7);

    %LOOPING(START=3,END=8);

    %LOOPING(START=3,END=9);

    %LOOPING(START=4,END=5);

    %LOOPING(START=4,END=6);

    %LOOPING(START=4,END=7);

    %LOOPING(START=4,END=8);

    %LOOPING(START=4,END=9);

    %LOOPING(START=5,END=6);

    %LOOPING(START=5,END=7);

    %LOOPING(START=5,END=8);

    %LOOPING(START=5,END=9);

    DATA final_want;

      SET FINAL_1_2

    FINAL_1_3

    FINAl_1_4

    FINAL_1_5

    FINAL_1_6

    FINAL_1_7

    FINAL_1_8

    FINAL_1_9

    FINAL_2_3

    FINAL_2_4

    FINAL_2_5

    FINAL_2_6

    FINAL_2_7

    FINAL_2_8

    FINAL_2_9

    FINAL_3_4

    FINAL_3_5

    FINAL_3_6

    FINAL_3_7

    FINAL_3_8

    FINAL_3_9

    FINAL_4_5

    FINAL_4_6

    FINAL_4_7

    FINAL_4_8

    FINAL_4_9

    FINAL_5_6

    FINAL_5_7

    FINAL_5_8

    FINAL_5_9

    ;

    run;

    It exactly gives what I want but I am wondering if I can shortened the code since it has a lot repetitive component. I am doing chi square test comparing each group and providing necessary values.  I am thinking if it is possible to insert Proc freq statement within hash table creation statement so that it gives more cleaner code.

    Thank you

    data_null__
    Jade | Level 19

    By creating all those separate data sets in the first part of you question you have made the problem much harder.  What you really needed was a BY variable that defines the "observations pairs"  for exampl 1_2, 1_3, 1_4, 2_3, 2_4, 3_4  Then you just use the BY variable in the PROC FREQ.

    Vince28_Statcan
    Quartz | Level 8

    I have to agree with data _null_ that your task would've been simplified with appropriate BY grouping.

    However, if you are happy with the output and are only looking for a cosmetic fix, you can always hide the spam in another macro loop. Since your loop control seems more of a manual thing than a produce all possible subsets, you can't quite use the original intent of %loop() macro to its full extent but here's a cosmetic fix  that can replace everything below %mend in your post

    %macro cosmetic();

    %do i=1 %to 5;

         %do j=%eval(&i+1) %to 9;

              %looping(start=&i, end=&j);

         %end;

    %end;

    data final_want;

         set

    %do i=1 %to 5;

         %do j=%eval(&i+1) %to 9;

              final_&i_&j

         %end;

    %end;

         ;

    run;

    %mend;

    %cosmetic();

    macro processor is merely parsing your 50 lines of codes.

    A simpler and likely better fix would've been to do the full lower or upper triangle of observation pairs, use BY processing as data _null_ pointed out and a WHERE condition to remove cases where the first of the two index is GT 5

    SAS Innovate 2025: Register Today!

     

    Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


    Register now!

    SAS Enterprise Guide vs. SAS Studio

    What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

    Find more tutorials on the SAS Users YouTube channel.

    SAS Training: Just a Click Away

     Ready to level-up your skills? Choose your own adventure.

    Browse our catalog!

    Discussion stats
    • 11 replies
    • 2567 views
    • 7 likes
    • 4 in conversation