Hello SAS community,
Can somebody suggest how to create a set of all possible pairs of observations (Cartesian product) without using proc sql?
Say I have a data set test1:
data test1;
do x=1 to 3;
y=x+10; output;
end;
run;
I would like to create a data set test2 that contains all possible combinations of x and y:
data test2;
input x y;
datalines;
1 11
1 12
1 13
2 11
2 12
2 13
3 11
3 12
3 13
;
run;
Not sure I understand because your examples don't match the simple definition of all possible pairs.
Instead consider if you have dataset X with all possible values of variable X. Similarly for Y.
So for each observation in X read in all observations in Y.
data want ;
set x;
do _n_=1 to nobs;
set y nobs=nobs point=_n_;
output;
end;
run;
Here is one approach:
data test1;
do x=1 to 3;
y=x+10; output;
end;
run;
data test2;
set test1;
do i=1 to nobs;
set test1 (keep=y) point=i nobs=nobs;
output;
end;
run;
proc print;run;
Regards,
Haikuo
Not sure I understand because your examples don't match the simple definition of all possible pairs.
Instead consider if you have dataset X with all possible values of variable X. Similarly for Y.
So for each observation in X read in all observations in Y.
data want ;
set x;
do _n_=1 to nobs;
set y nobs=nobs point=_n_;
output;
end;
run;
data test1;
do x=1 to 3;
y=x+10; output;
end;
run;
data test2;
set test1;
drop y;
run;
proc print noobs; run;
data test3;
set test1;
drop x;
run;
proc print noobs; run;
proc sql;
select x,y from test2, test3;
quit;
Output:
x y
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
1 11
1 12
1 13
2 11
2 12
2 13
3 11
3 12
3 13
Thank you Hima. Your approach requires proc sql though...
Just for fun, here is a hash approach,
data test2 (drop=_:);
if _n_=1 then do;
set test1(obs=1);
dcl hash h(dataset: 'test1', ordered: 'a');
h.definekey('y');
h.definedata('y');
h.definedone();
dcl hiter hi('h');
end;
set test1;
do _rc=hi.first() by 0 while (_rc=0);
output;
_rc=hi.next();
end;
run;
Thank you Haikuo. Would you care to elaborate what are advantages of the hash approch you demonstrated over the "stacking y's for all x's" approach that you and Tom posted?
Hash in general is more efficient for saving I/O time by process records completely in memory. I can't speak for this case, you will have to do a benchmark.
Haikuo
Thank you everyone for posting!
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.