Hello,
I am using SAS EG 7.1.
I have two tables called customer and product. I want to cross join these tables and filtering in same step. View of input tables and table that i want to create is as below:
In real; customer table have approx 10 millions rows, product table have approx 300 rows and i prefer to use datastep language because of it faster than proc sql language and i want to filter in same step for more efficiency.
I tried below code but it doesn't work as i want.
data want;
set customer;
do i = 1 to n;
set work.product point = i nobs = n;
if cust_var2 ne prod_id and cust_var1 le prod_var1;
output;
end;
run;
Actually proc sql language that below works but it is very slow because of table size.
proc sql; create table want3 as select * from customer t1, product t2 where t1.cust_var2 ne t2.prod_id and t1.cust_var1 le t2.prod_var1; quit;
Thank you
data product; infile datalines delimiter=','; input prod_id product_name $ prod_var1; datalines; 1, aaa, 100 2, bbb, 250 3, ccc, 200 4, ddd, 100 5, ddd, 250 ; run; data customer; infile datalines delimiter=','; input cust_id cust_var1 cust_var2; datalines; 1, 100, 2 2, 200, 5 3, 300, 5 ; run;
See here an application of temporary arrays to your data:
data product;
infile datalines delimiter=',';
input prod_id product_name $ prod_var1;
datalines;
1, aaa, 100
2, bbb, 250
3, ccc, 200
4, ddd, 100
5, ddd, 250
;
data customer;
infile datalines delimiter=',';
input cust_id cust_var1 cust_var2;
datalines;
1, 100, 2
2, 200, 5
3, 300, 5
;
proc sort data=product;
by descending prod_var1;
run;
proc sql noprint;
select nobs into :nobs
from dictionary.tables
where libname = "WORK" and memname = "PRODUCT";
quit;
data want;
set customer;
if _n_ = 1
then do;
array p_id {&nobs.} _temporary_;
array p_name {&nobs.} $8 _temporary_;
array p_var1 {&nobs.} _temporary_;
do i = 1 to &nobs.;
set product;
p_id{i} = prod_id;
p_name{i} = product_name;
p_var1{i} = prod_var1;
end;
end;
do i = 1 to dim(p_id);
if p_id{i} ne cust_var2 and cust_var1 le p_var1{i}
then do;
prod_id = p_id{i};
product_name = p_name{i};
prod_var1 = p_var1{i};
output;
end;
if cust_var1 gt p_var1{i} then leave;
end;
drop i;
run;
proc sort data=want;
by cust_id prod_id;
run;
proc print data=want noobs;
run;
Result:
cust_id cust_var1 cust_var2 prod_id product_name prod_var1 1 100 2 1 aaa 100 1 100 2 3 ccc 200 1 100 2 4 ddd 100 1 100 2 5 ddd 250 2 200 5 2 bbb 250 2 200 5 3 ccc 200
If you only have 300 products then use a data step hash object to hold that table.
Your join criteria appears very wonky. Please describe the business rule you are trying to code. Also provide desired output from the given input data.
data product;
infile datalines delimiter=',';
input prod_id product_name $ prod_var1;
datalines;
1, aaa, 100
2, bbb, 250
3, ccc, 200
4, ddd, 100
5, ddd, 250
;
run;
data customer;
infile datalines delimiter=',';
input cust_id cust_var1 cust_var2;
datalines;
1, 100, 2
2, 200, 5
3, 300, 5
;
run;
data want;
set customer;
do j=1 to nobs2;
set work.product point=j nobs=nobs2;
if cust_var2 ne prod_id and cust_var1 le prod_var1 then
output;
end;
run;
Do not use a subsetting IF. It will immediately exit the DO loop and the data step iteration.
Consider loading your product table into temporary arrays (one per column). That will speed up the DO loop significantly.
Run a test with a small subset of your customer table, to get a feel for the output. It may very well be that you end up with (close to) 3 billion observations.
See here an application of temporary arrays to your data:
data product;
infile datalines delimiter=',';
input prod_id product_name $ prod_var1;
datalines;
1, aaa, 100
2, bbb, 250
3, ccc, 200
4, ddd, 100
5, ddd, 250
;
data customer;
infile datalines delimiter=',';
input cust_id cust_var1 cust_var2;
datalines;
1, 100, 2
2, 200, 5
3, 300, 5
;
proc sort data=product;
by descending prod_var1;
run;
proc sql noprint;
select nobs into :nobs
from dictionary.tables
where libname = "WORK" and memname = "PRODUCT";
quit;
data want;
set customer;
if _n_ = 1
then do;
array p_id {&nobs.} _temporary_;
array p_name {&nobs.} $8 _temporary_;
array p_var1 {&nobs.} _temporary_;
do i = 1 to &nobs.;
set product;
p_id{i} = prod_id;
p_name{i} = product_name;
p_var1{i} = prod_var1;
end;
end;
do i = 1 to dim(p_id);
if p_id{i} ne cust_var2 and cust_var1 le p_var1{i}
then do;
prod_id = p_id{i};
product_name = p_name{i};
prod_var1 = p_var1{i};
output;
end;
if cust_var1 gt p_var1{i} then leave;
end;
drop i;
run;
proc sort data=want;
by cust_id prod_id;
run;
proc print data=want noobs;
run;
Result:
cust_id cust_var1 cust_var2 prod_id product_name prod_var1 1 100 2 1 aaa 100 1 100 2 3 ccc 200 1 100 2 4 ddd 100 1 100 2 5 ddd 250 2 200 5 2 bbb 250 2 200 5 3 ccc 200
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.