Desktop productivity for business analysts and programmers

Output sas datasets from combination of rows from a dataset

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 89
Accepted Solution

Output sas datasets from combination of rows from a dataset

Hello all,

I am trying to get sets of dataset from a single data set . I am wondering if you can help on this.

data have;

input name category count;

cards;

a   1  10

b   2   20

c   3   30

d   1   10

e   4    10;

run;

i am trying to get following datasets;

data one

name category count

a 1 10

b 2 20

data two

name category count

a 1 10

c  3 30

I want to create table with all possible combinations.

Thank you


Accepted Solutions
Solution
‎11-08-2013 09:37 AM
Respected Advisor
Posts: 3,124

Re: Output sas datasets from combination of rows from a dataset

data have;

input name$ category count;

cards;

a   1  10

b   2   20

c   3   30

d   1   10

e   4    10

;

run;

data _null_;

call symputx('nobs',nobs);

set have nobs=nobs;

stop;

run;

data _null_;

if _n_=1 then do;

   if 0 then set have;

   declare hash h(multidata:'y');

   h.definekey('name', 'category', 'count');

   h.definedata('name' ,'category', 'count');

   h.definedone();

end;

   array x[&nobs];

   do i=1 to &nobs;

     x(i)=i;

   end;

   n=&nobs;

   k=2;

   ncomb=comb(n,k);

   do j=1 to ncomb;

      call allcomb(j, k, of x

  • );
  •       do i=1 to k;

              p=x(i);

            set have point=p;

            rc=h.add();

          end;

          rc=h.output(dataset:cats('want',j));

          rc=h.clear();

       end;

       stop;

    run;

    View solution in original post


    All Replies
    Respected Advisor
    Posts: 3,124

    Re: Output sas datasets from combination of rows from a dataset

    all the combination just by n=2 or all the combination literally.

    Frequent Contributor
    Posts: 89

    Re: Output sas datasets from combination of rows from a dataset

    Hello it is n=2 combinations only.

    Thank you

    Solution
    ‎11-08-2013 09:37 AM
    Respected Advisor
    Posts: 3,124

    Re: Output sas datasets from combination of rows from a dataset

    data have;

    input name$ category count;

    cards;

    a   1  10

    b   2   20

    c   3   30

    d   1   10

    e   4    10

    ;

    run;

    data _null_;

    call symputx('nobs',nobs);

    set have nobs=nobs;

    stop;

    run;

    data _null_;

    if _n_=1 then do;

       if 0 then set have;

       declare hash h(multidata:'y');

       h.definekey('name', 'category', 'count');

       h.definedata('name' ,'category', 'count');

       h.definedone();

    end;

       array x[&nobs];

       do i=1 to &nobs;

         x(i)=i;

       end;

       n=&nobs;

       k=2;

       ncomb=comb(n,k);

       do j=1 to ncomb;

          call allcomb(j, k, of x

  • );
  •       do i=1 to k;

              p=x(i);

            set have point=p;

            rc=h.add();

          end;

          rc=h.output(dataset:cats('want',j));

          rc=h.clear();

       end;

       stop;

    run;

    Super Contributor
    Posts: 339

    Re: Output sas datasets from combination of rows from a dataset

    Is this a homework purpose with a clear scope of only exactly 5 rows as defined above or is this meant to be an expandable solution that can create an insanely large amount of new datasets? There are many different approaches depending on the end purpose.

    One such available approach is using hash tables' output method so that you can index your data.

    data _null_;

         if _N_=1 then do;

              set have nobs=maxobs;

              declare hash myhash(multidata: 'YES');

              myhash.definekey('name');

              myhash.definedata('name', 'category', 'count');

              myhash.definedone();

         end;

         do i=1 to maxobs;

              do j=i+1 to maxobs;

                    set have point=i;

                   myhash.add();

                   set have point=j;

                   myhash.add();

                   myhash.output(dataset: catx("_", "want", i, j));

                   myhash.clear();

              end;

         end;

         stop;

    run;

    Vincent

    edit: Snap, was beat to examplifying the use of hash object output method to alleviate the burden of using macros for output dataset naming! Nice one Hai.Kuo

    Respected Advisor
    Posts: 3,775

    Re: Output sas datasets from combination of rows from a dataset

    You need STOP;

    Super Contributor
    Posts: 339

    Re: Output sas datasets from combination of rows from a dataset

    Had already corrected it up. I'm still somewhat puzzled as to how the data step establish what will constitute it's last loop iteration. I would've assumed set statements with POINT= options would not affect data step iterations and that this would've behaved like data want; if _n_=1 then set have; end; run;

    Anyway if you have any further explanation or additionnal insights to provide on this specific topic, I'll gladly welcome them. Hopefully proc DS2 eventually makes it more clear.

    Respected Advisor
    Posts: 3,775

    Re: Output sas datasets from combination of rows from a dataset

    CAUTION: Continuous loops can occur when you use the POINT= option. When you use the POINT= option, you must include a STOP statement to stop DATA step processing, programming logic that checks for an invalid value of the POINT= variable, or both. Because POINT= reads only those observations that are specified in the DO statement, SAS cannot read an end-of-file indicator as it would if the file were being read sequentially. Because reading an end-of-file indicator ends a DATA step automatically, failure to substitute another means of ending the DATA step when you use POINT= can cause the DATA step to go into a continuous loop. If SAS reads an invalid value of the POINT= variable, it sets the automatic variable _ERROR_ to 1. Use this information to check for conditions that cause continuous DO-loop processing, or include a STOP statement at the end of the DATA step, or both.

    Frequent Contributor
    Posts: 89

    Re: Output sas datasets from combination of rows from a dataset

    Thank you so much and everybody. I am a learning SAS everyday and this is the most amazing application I have ever seen.

    Frequent Contributor
    Posts: 89

    Re: Output sas datasets from combination of rows from a dataset

    Hello and   I want to get one step more complex. I created following analysis using .

    %MACRO LOOPING( START=,END=);

    proc freq data=have_&START._&END;

      tables name*category/ chisq cl ALL;

      output out=testy_&START._&END(KEEP=_PCHI_ P_PCHI L_RROR U_RROR) pchi ALL;

      exact or;

      weight count;

    run;

    DATA FINAL_&START._&END;

      SET TESTY_&START._&END;

    GROUP = "&START._&END";

    WHERE _PCHI_ ^=.;

    RUN;

    %MEND;

    %LOOPING(START=1,END=2);

    %LOOPING(START=1,END=3);

    %LOOPING(START=1,END=4);

    %LOOPING(START=1,END=5);

    %LOOPING(START=1,END=6);

    %LOOPING(START=1,END=7);

    %LOOPING(START=1,END=8);

    %LOOPING(START=1,END=9);

    %LOOPING(START=2,END=3);

    %LOOPING(START=2,END=4);

    %LOOPING(START=2,END=5);

    %LOOPING(START=2,END=6);

    %LOOPING(START=2,END=7);

    %LOOPING(START=2,END=8);

    %LOOPING(START=2,END=9);

    %LOOPING(START=3,END=4);

    %LOOPING(START=3,END=5);

    %LOOPING(START=3,END=6);

    %LOOPING(START=3,END=7);

    %LOOPING(START=3,END=8);

    %LOOPING(START=3,END=9);

    %LOOPING(START=4,END=5);

    %LOOPING(START=4,END=6);

    %LOOPING(START=4,END=7);

    %LOOPING(START=4,END=8);

    %LOOPING(START=4,END=9);

    %LOOPING(START=5,END=6);

    %LOOPING(START=5,END=7);

    %LOOPING(START=5,END=8);

    %LOOPING(START=5,END=9);

    DATA final_want;

      SET FINAL_1_2

    FINAL_1_3

    FINAl_1_4

    FINAL_1_5

    FINAL_1_6

    FINAL_1_7

    FINAL_1_8

    FINAL_1_9

    FINAL_2_3

    FINAL_2_4

    FINAL_2_5

    FINAL_2_6

    FINAL_2_7

    FINAL_2_8

    FINAL_2_9

    FINAL_3_4

    FINAL_3_5

    FINAL_3_6

    FINAL_3_7

    FINAL_3_8

    FINAL_3_9

    FINAL_4_5

    FINAL_4_6

    FINAL_4_7

    FINAL_4_8

    FINAL_4_9

    FINAL_5_6

    FINAL_5_7

    FINAL_5_8

    FINAL_5_9

    ;

    run;

    It exactly gives what I want but I am wondering if I can shortened the code since it has a lot repetitive component. I am doing chi square test comparing each group and providing necessary values.  I am thinking if it is possible to insert Proc freq statement within hash table creation statement so that it gives more cleaner code.

    Thank you

    Respected Advisor
    Posts: 3,775

    Re: Output sas datasets from combination of rows from a dataset

    By creating all those separate data sets in the first part of you question you have made the problem much harder.  What you really needed was a BY variable that defines the "observations pairs"  for exampl 1_2, 1_3, 1_4, 2_3, 2_4, 3_4  Then you just use the BY variable in the PROC FREQ.

    Super Contributor
    Posts: 339

    Re: Output sas datasets from combination of rows from a dataset

    I have to agree with data _null_ that your task would've been simplified with appropriate BY grouping.

    However, if you are happy with the output and are only looking for a cosmetic fix, you can always hide the spam in another macro loop. Since your loop control seems more of a manual thing than a produce all possible subsets, you can't quite use the original intent of %loop() macro to its full extent but here's a cosmetic fix  that can replace everything below %mend in your post

    %macro cosmetic();

    %do i=1 %to 5;

         %do j=%eval(&i+1) %to 9;

              %looping(start=&i, end=&j);

         %end;

    %end;

    data final_want;

         set

    %do i=1 %to 5;

         %do j=%eval(&i+1) %to 9;

              final_&i_&j

         %end;

    %end;

         ;

    run;

    %mend;

    %cosmetic();

    macro processor is merely parsing your 50 lines of codes.

    A simpler and likely better fix would've been to do the full lower or upper triangle of observation pairs, use BY processing as data _null_ pointed out and a WHERE condition to remove cases where the first of the two index is GT 5

    ☑ This topic is SOLVED.

    Need further help from the community? Please ask a new question.

    Discussion stats
    • 11 replies
    • 582 views
    • 7 likes
    • 4 in conversation