Solved: Re: How to create several datasets sampled from one column

mamzolo · Posted 09-25-2014 06:20 AM

I have a dataset as follows (example)

y	Group	k	code
10	1	1	1
11	1	2	2
12	1	3	3
13	2	1	4
14	2	2	5
15	2	3	6

I want to create several datasets of size 2 where I pick one observation from group 1 and 2 to form one dataset e.g.;

data 1

Y group k code

11 1 1 1

13 2 1 4

data 2,

y group k code

11 1 1 1

14 2 2 5

and so forth.

Currently I have been doing this using proc sql (e.q., create table data1 as select *,1 as id from sample where code eq 1 or code eq 4) and the challenge is that when the levels of groups and k increase it is nearly impossible to do all these combinations by hand. Is there anyone who can help me in writing a simple code that will pick the required values without having to specify the code?

RW9 · Posted 09-25-2014 08:32 AM

Update to:

data have;

infile datalines;

input y Group k code;

datalines;

10 1 1 1

11 1 2 2

12 1 3 3

13 2 1 4

14 2 2 5

15 2 3 6

;

run;

data _null_;

ind=1;

do i=1 to 3;

do j=4 to 6;

call execute('proc sql;

create table DATA'||strip(put(ind,best.))||' as

select *,

'||strip(put(ind,best.))||' as ID

from WORK.HAVE

where CODE='||strip(put(i,best.))||'

or CODE='||strip(put(j,best.))||';

quit;');

ind=ind+1;

end;

run;

View solution in original post

RW9 · Posted 09-25-2014 06:33 AM

Hi,

Well not entirely sure on your logic, but the below code will loop over and generate a proc sql based on the loop. You can modify this to your particular logic:

data have;
infile datalines;
input y Group k code;
datalines;
10 1 1 1
11 1 2 2
12 1 3 3
13 2 1 4
14 2 2 5
15 2 3 6
;
run;

proc sort data=have out=loop (obs=1);
by descending code;
run;

data _null_;
set loop;
do i=2 to code;
    call execute('proc sql;
                    create table DATA'||strip(put(i-1,best.))||' as
                    select *,
                            '||strip(put(i-1,best.))||' as ID
                    from    WORK.HAVE
                    where   CODE=1
                      or    CODE='||strip(put(i,best.))||';
                  quit;');
end;
run;

Community_Help · Posted 09-25-2014 07:22 AM

Hi there, this question was asked in the About SAS Communities space where we post help info about the community. No problem, though. I'm moving it to SAS Procedures where I think it goes. Thanks for joining the community and asking a question! Thanks RW9 for your response, too...

mamzolo · Posted 09-25-2014 07:34 AM

Thank you RW9.

It's almost there but not quite there yet

Overall, I will have 9 datasets with the following levels of the 'code' variable:

data1: code 1 & 4

data2: code 1 & 5

data3: code 1 & 6

data4: code 2 & 4

data5: code 2 & 5

data6: code 2 & 6

data7: code 3 & 4

data8: code 3 & 5

data9: code 3 & 6

Thus, each set contains data points from each group.

Sorry about that Communities Admin

RW9 · Posted 09-25-2014 08:32 AM

Update to:

data have;

infile datalines;

input y Group k code;

datalines;

10 1 1 1

11 1 2 2

12 1 3 3

13 2 1 4

14 2 2 5

15 2 3 6

;

run;

data _null_;

ind=1;

do i=1 to 3;

do j=4 to 6;

call execute('proc sql;

create table DATA'||strip(put(ind,best.))||' as

select *,

'||strip(put(ind,best.))||' as ID

from WORK.HAVE

where CODE='||strip(put(i,best.))||'

or CODE='||strip(put(j,best.))||';

quit;');

ind=ind+1;

end;

run;

Ksharp · Posted 09-25-2014 08:42 AM

Are there just two groups ?

data have;
  infile datalines;
  input y Group k code;
datalines; 
10 1 1 1 
11 1 2 2  
12 1 3 3 
13 2 1 4 
14 2 2 5 
15 2 3 6 
;
run;

data x(drop=n );
 set have end=last;
 length c $ 20;
 retain n;
 if group ne lag(group) then n=_n_;
if last then do;
 do i=1 to n-1;
  do j=n to _n_;
   key+1;
   set have point=i;c=catx('_',c,code);output;     
   set have point=j;c=catx('_',c,code);output; call missing(c);
end;
end;
stop;
end;
run;


data _null_;
 if _n_ eq 1 then do;
 if 0 then set x;
  declare hash ha(multidata:'y');
   ha.definekey('key');
   ha.definedata('y', 'Group', 'k', 'code');
   ha.definedone();
 end;
set x;
by key;
ha.add();
if last.key then do;ha.output(dataset: catx('_','code',c));ha.clear();end;
run;

Xia Keshan

mamzolo · Posted 09-25-2014 09:07 AM

Hi Xia Keshan,

Yes, in the current example it is only two, but this was just for simplification. In my current data the number of groups and repetitions (k) vary but I was able to extend the code of RW9 (by including additional do loops) to accommodate my settings. I believe the same can be done with your code.

Thanks to the both of you (Keshan and RW9), you've been very helpful!

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away