Solved: Re: Output sas datasets from combination of rows from a dataset

learner_sas · Posted 11-07-2013 05:43 PM

Hello all,

I am trying to get sets of dataset from a single data set . I am wondering if you can help on this.

data have;

input name category count;

cards;

a 1 10

b 2 20

c 3 30

d 1 10

e 4 10;

run;

i am trying to get following datasets;

data one

name category count

a 1 10

b 2 20

data two

name category count

a 1 10

c 3 30

I want to create table with all possible combinations.

Thank you

Haikuo · Posted 11-08-2013 09:37 AM

data have;

input name$ category count;

cards;

a 1 10

b 2 20

c 3 30

d 1 10

e 4 10

;

run;

data _null_;

call symputx('nobs',nobs);

set have nobs=nobs;

stop;

run;

data _null_;

if _n_=1 then do;

if 0 then set have;

declare hash h(multidata:'y');

h.definekey('name', 'category', 'count');

h.definedata('name' ,'category', 'count');

h.definedone();

end;

array x[&nobs];

do i=1 to &nobs;

x(i)=i;

end;

n=&nobs;

k=2;

ncomb=comb(n,k);

do j=1 to ncomb;

call allcomb(j, k, of x

);

do i=1 to k;

p=x(i);

set have point=p;

rc=h.add();

end;

rc=h.output(dataset:cats('want',j));

rc=h.clear();

end;

stop;

run;

View solution in original post

Haikuo · Posted 11-07-2013 11:55 PM

all the combination just by n=2 or all the combination literally.

learner_sas · Posted 11-08-2013 08:37 AM

Hello it is n=2 combinations only.

Thank you

Haikuo · Posted 11-08-2013 09:37 AM

data have;

input name$ category count;

cards;

a 1 10

b 2 20

c 3 30

d 1 10

e 4 10

;

run;

data _null_;

call symputx('nobs',nobs);

set have nobs=nobs;

stop;

run;

data _null_;

if _n_=1 then do;

if 0 then set have;

declare hash h(multidata:'y');

h.definekey('name', 'category', 'count');

h.definedata('name' ,'category', 'count');

h.definedone();

end;

array x[&nobs];

do i=1 to &nobs;

x(i)=i;

end;

n=&nobs;

k=2;

ncomb=comb(n,k);

do j=1 to ncomb;

call allcomb(j, k, of x

);

do i=1 to k;

p=x(i);

set have point=p;

rc=h.add();

end;

rc=h.output(dataset:cats('want',j));

rc=h.clear();

end;

stop;

run;

Vince28_Statcan · Posted 11-08-2013 09:50 AM

Is this a homework purpose with a clear scope of only exactly 5 rows as defined above or is this meant to be an expandable solution that can create an insanely large amount of new datasets? There are many different approaches depending on the end purpose.

One such available approach is using hash tables' output method so that you can index your data.

data _null_;

if _N_=1 then do;

set have nobs=maxobs;

declare hash myhash(multidata: 'YES');

myhash.definekey('name');

myhash.definedata('name', 'category', 'count');

myhash.definedone();

end;

do i=1 to maxobs;

do j=i+1 to maxobs;

set have point=i;

myhash.add();

set have point=j;

myhash.add();

myhash.output(dataset: catx("_", "want", i, j));

myhash.clear();

end;

stop;

run;

Vincent

edit: Snap, was beat to examplifying the use of hash object output method to alleviate the burden of using macros for output dataset naming! Nice one Hai.Kuo

data_null__ · Posted 11-08-2013 10:10 AM

You need STOP;

Vince28_Statcan · Posted 11-08-2013 10:42 AM

Had already corrected it up. I'm still somewhat puzzled as to how the data step establish what will constitute it's last loop iteration. I would've assumed set statements with POINT= options would not affect data step iterations and that this would've behaved like data want; if _n_=1 then set have; end; run;

Anyway if you have any further explanation or additionnal insights to provide on this specific topic, I'll gladly welcome them. Hopefully proc DS2 eventually makes it more clear.

data_null__ · Posted 11-08-2013 12:44 PM

CAUTION: Continuous loops can occur when you use the POINT= option. When you use the POINT= option, you must include a STOP statement to stop DATA step processing, programming logic that checks for an invalid value of the POINT= variable, or both. Because POINT= reads only those observations that are specified in the DO statement, SAS cannot read an end-of-file indicator as it would if the file were being read sequentially. Because reading an end-of-file indicator ends a DATA step automatically, failure to substitute another means of ending the DATA step when you use POINT= can cause the DATA step to go into a continuous loop. If SAS reads an invalid value of the POINT= variable, it sets the automatic variable _ERROR_ to 1. Use this information to check for conditions that cause continuous DO-loop processing, or include a STOP statement at the end of the DATA step, or both.

learner_sas · Posted 11-08-2013 02:48 PM

Thank you so much and everybody. I am a learning SAS everyday and this is the most amazing application I have ever seen.

learner_sas · Posted 11-10-2013 10:34 AM

Hello and I want to get one step more complex. I created following analysis using .

%MACRO LOOPING( START=,END=);

proc freq data=have_&START._&END;

tables name*category/ chisq cl ALL;

output out=testy_&START._&END(KEEP=_PCHI_ P_PCHI L_RROR U_RROR) pchi ALL;

exact or;

weight count;

run;

DATA FINAL_&START._&END;

SET TESTY_&START._&END;

GROUP = "&START._&END";

WHERE _PCHI_ ^=.;

RUN;

%MEND;

%LOOPING(START=1,END=2);

%LOOPING(START=1,END=3);

%LOOPING(START=1,END=4);

%LOOPING(START=1,END=5);

%LOOPING(START=1,END=6);

%LOOPING(START=1,END=7);

%LOOPING(START=1,END=8);

%LOOPING(START=1,END=9);

%LOOPING(START=2,END=3);

%LOOPING(START=2,END=4);

%LOOPING(START=2,END=5);

%LOOPING(START=2,END=6);

%LOOPING(START=2,END=7);

%LOOPING(START=2,END=8);

%LOOPING(START=2,END=9);

%LOOPING(START=3,END=4);

%LOOPING(START=3,END=5);

%LOOPING(START=3,END=6);

%LOOPING(START=3,END=7);

%LOOPING(START=3,END=8);

%LOOPING(START=3,END=9);

%LOOPING(START=4,END=5);

%LOOPING(START=4,END=6);

%LOOPING(START=4,END=7);

%LOOPING(START=4,END=8);

%LOOPING(START=4,END=9);

%LOOPING(START=5,END=6);

%LOOPING(START=5,END=7);

%LOOPING(START=5,END=8);

%LOOPING(START=5,END=9);

DATA final_want;

SET FINAL_1_2

FINAL_1_3

FINAl_1_4

FINAL_1_5

FINAL_1_6

FINAL_1_7

FINAL_1_8

FINAL_1_9

FINAL_2_3

FINAL_2_4

FINAL_2_5

FINAL_2_6

FINAL_2_7

FINAL_2_8

FINAL_2_9

FINAL_3_4

FINAL_3_5

FINAL_3_6

FINAL_3_7

FINAL_3_8

FINAL_3_9

FINAL_4_5

FINAL_4_6

FINAL_4_7

FINAL_4_8

FINAL_4_9

FINAL_5_6

FINAL_5_7

FINAL_5_8

FINAL_5_9

;

run;

It exactly gives what I want but I am wondering if I can shortened the code since it has a lot repetitive component. I am doing chi square test comparing each group and providing necessary values. I am thinking if it is possible to insert Proc freq statement within hash table creation statement so that it gives more cleaner code.

Thank you

data_null__ · Posted 11-11-2013 09:46 AM

By creating all those separate data sets in the first part of you question you have made the problem much harder. What you really needed was a BY variable that defines the "observations pairs" for exampl 1_2, 1_3, 1_4, 2_3, 2_4, 3_4 Then you just use the BY variable in the PROC FREQ.

Vince28_Statcan · Posted 11-12-2013 12:49 PM

I have to agree with data _null_ that your task would've been simplified with appropriate BY grouping.

However, if you are happy with the output and are only looking for a cosmetic fix, you can always hide the spam in another macro loop. Since your loop control seems more of a manual thing than a produce all possible subsets, you can't quite use the original intent of %loop() macro to its full extent but here's a cosmetic fix that can replace everything below %mend in your post

%macro cosmetic();

%do i=1 %to 5;

%do j=%eval(&i+1) %to 9;

%looping(start=&i, end=&j);

%end;

data final_want;

set

%do i=1 %to 5;

%do j=%eval(&i+1) %to 9;

final_&i_&j

%end;

;

run;

%mend;

%cosmetic();

macro processor is merely parsing your 50 lines of codes.

A simpler and likely better fix would've been to do the full lower or upper triangle of observation pairs, use BY processing as data _null_ pointed out and a WHERE condition to remove cases where the first of the two index is GT 5

Registration is open

SAS Training: Just a Click Away