Hello,
I'm looking to create a cartesian product with 3 tables through datastep where 3 tables are following:
data min_scoperto_1;
input min_Scoperto $;
datalines;
0.01
0.02
0.03
;
run;
data scoperto_1;
input Scoperto $;
datalines;
0.1
0.2
0.3
0.5
1.0
2.0
;
run;
data limite_1;
input limite $;
datalines;
40
50
60
70
80
90
;
run;
For two tables - through datastep - the right code is the following
data c;
set LIMITE_1;
do i=1 to n;
set SCOPERTO_1 point=i nobs=n;
output;
end;
run;
Which is the right sas code through datastep for the 3 tables?
Thanks in advance!
max
SQL will even outperform the data step loops once the datasets grow:
72
73 data t1;
74 do var1 = 1 to 300;
75 output;
76 end;
77 run;
NOTE: The data set WORK.T1 has 300 observations and 1 variables.
NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
real time 0.00 seconds
cpu time 0.00 seconds
78
79 data t2;
80 do var2 = 1 to 300;
81 output;
82 end;
83 run;
NOTE: The data set WORK.T2 has 300 observations and 1 variables.
NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
real time 0.00 seconds
cpu time 0.00 seconds
84
85 data t3;
86 do var3 = 1 to 300;
87 output;
88 end;
89 run;
NOTE: The data set WORK.T3 has 300 observations and 1 variables.
NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
real time 0.00 seconds
cpu time 0.00 seconds
90
91 proc sql;
92 create table t4_1 as
93 select *
94 from t1, t2, t3
95 ;
NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized.
NOTE: Table WORK.T4_1 created, with 27000000 rows and 3 columns.
96 quit;
NOTE: Verwendet wurde: PROZEDUR SQL - (Gesamtverarbeitungszeit):
real time 5.47 seconds
cpu time 2.45 seconds
97
98 data t4_2;
99 set t1;
100 do i = 1 to n1;
101 set t2 point=i nobs=n1;
102 do j = 1 to n2;
103 set t3 point=j nobs=n2;
104 output;
105 end;
106 end;
107 run;
NOTE: There were 300 observations read from the data set WORK.T1.
NOTE: The data set WORK.T4_2 has 27000000 observations and 3 variables.
NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
real time 7.87 seconds
cpu time 4.71 seconds
Maxim 14: Use the Right Tool.
Which means that this is the right SAS code:
proc sql;
create table want as
select *
from min_scoperto_1, scoperto_1, limite_1
;
quit;
But you already have the data step method for 2 datasets; just nest a second DO loop for the third dataset inside the DO loop you already have for the second dataset.
tks a lot!
SQL will even outperform the data step loops once the datasets grow:
72
73 data t1;
74 do var1 = 1 to 300;
75 output;
76 end;
77 run;
NOTE: The data set WORK.T1 has 300 observations and 1 variables.
NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
real time 0.00 seconds
cpu time 0.00 seconds
78
79 data t2;
80 do var2 = 1 to 300;
81 output;
82 end;
83 run;
NOTE: The data set WORK.T2 has 300 observations and 1 variables.
NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
real time 0.00 seconds
cpu time 0.00 seconds
84
85 data t3;
86 do var3 = 1 to 300;
87 output;
88 end;
89 run;
NOTE: The data set WORK.T3 has 300 observations and 1 variables.
NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
real time 0.00 seconds
cpu time 0.00 seconds
90
91 proc sql;
92 create table t4_1 as
93 select *
94 from t1, t2, t3
95 ;
NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized.
NOTE: Table WORK.T4_1 created, with 27000000 rows and 3 columns.
96 quit;
NOTE: Verwendet wurde: PROZEDUR SQL - (Gesamtverarbeitungszeit):
real time 5.47 seconds
cpu time 2.45 seconds
97
98 data t4_2;
99 set t1;
100 do i = 1 to n1;
101 set t2 point=i nobs=n1;
102 do j = 1 to n2;
103 set t3 point=j nobs=n2;
104 output;
105 end;
106 end;
107 run;
NOTE: There were 300 observations read from the data set WORK.T1.
NOTE: The data set WORK.T4_2 has 27000000 observations and 3 variables.
NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
real time 7.87 seconds
cpu time 4.71 seconds
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.