BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
maozsoy
Fluorite | Level 6

Hello,

I am using SAS EG 7.1.

I have two tables called customer and product. I want to cross join these tables and filtering in same step. View of input tables and table that i want to create is as below:

Untitled.png

 

In real; customer table have approx 10 millions rows, product table have approx 300 rows and i prefer to use datastep language because of it faster than proc sql language and i want to filter in same step for more efficiency.

 

 

I tried below code but it doesn't work as i want.

data want;
set customer;
do i = 1 to n;
	set work.product point = i nobs = n;
	if cust_var2 ne prod_id and cust_var1 le prod_var1;
	output;
end;
run;

Actually proc sql language that below works but it is very slow because of table size.

proc sql;
create table want3 as select
*
from customer t1, product t2
where 
t1.cust_var2 ne t2.prod_id and t1.cust_var1 le t2.prod_var1;
quit;

Thank you

 

data product;
infile datalines delimiter=',';
input prod_id product_name $ prod_var1;
datalines;
1, aaa, 100
2, bbb, 250
3, ccc, 200
4, ddd, 100
5, ddd, 250
;
run;

data customer;
infile datalines delimiter=',';
input cust_id cust_var1 cust_var2;
datalines;
1, 100, 2
2, 200, 5
3, 300, 5
;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

See here an application of temporary arrays to your data:

 

data product;
infile datalines delimiter=',';
input prod_id product_name $ prod_var1;
datalines;
1, aaa, 100
2, bbb, 250
3, ccc, 200
4, ddd, 100
5, ddd, 250
;

data customer;
infile datalines delimiter=',';
input cust_id cust_var1 cust_var2;
datalines;
1, 100, 2
2, 200, 5
3, 300, 5
;

proc sort data=product;
by descending prod_var1;
run;

proc sql noprint;
select nobs into :nobs
from dictionary.tables
where libname = "WORK" and memname = "PRODUCT";
quit;

data want;
set customer;
if _n_ = 1
then do;
  array p_id {&nobs.} _temporary_;
  array p_name {&nobs.} $8 _temporary_;
  array p_var1 {&nobs.} _temporary_;
  do i = 1 to &nobs.;
    set product;
    p_id{i} = prod_id;
    p_name{i} = product_name;
    p_var1{i} = prod_var1;
  end;
end;
do i = 1 to dim(p_id);
  if p_id{i} ne cust_var2 and cust_var1 le p_var1{i}
  then do;
    prod_id = p_id{i};
    product_name = p_name{i};
    prod_var1 = p_var1{i};
    output;
  end;
  if cust_var1 gt p_var1{i} then leave;
end;
drop i;
run;

proc sort data=want;
by cust_id prod_id;
run;

proc print data=want noobs;
run;

Result:

 

 

cust_id	cust_var1	cust_var2	prod_id	product_name	prod_var1
1	100	2	1	aaa	100
1	100	2	3	ccc	200
1	100	2	4	ddd	100
1	100	2	5	ddd	250
2	200	5	2	bbb	250
2	200	5	3	ccc	200

 

 

View solution in original post

4 REPLIES 4
Tom
Super User Tom
Super User

If you only have 300 products then use a data step hash object to hold that table.

 

Your join criteria appears very wonky.  Please describe the business rule you are trying to code. Also provide desired output from the given input data.

hhinohar
Quartz | Level 8
data product;
	infile datalines delimiter=',';
	input prod_id product_name $ prod_var1;
	datalines;
1, aaa, 100
2, bbb, 250
3, ccc, 200
4, ddd, 100
5, ddd, 250
;
run;

data customer;
	infile datalines delimiter=',';
	input cust_id cust_var1 cust_var2;
	datalines;
1, 100, 2
2, 200, 5
3, 300, 5
;
run;

data want;
	set customer;

	do j=1 to nobs2;
		set work.product point=j nobs=nobs2;
		if cust_var2 ne prod_id and cust_var1 le prod_var1 then
			output;
	end;
run;
Kurt_Bremser
Super User

Do not use a subsetting IF. It will immediately exit the DO loop and the data step iteration.

Consider loading your product table into temporary arrays (one per column). That will speed up the DO loop significantly.

Run a test with a small subset of your customer table, to get a feel for the output. It may very well be that you end up with (close to) 3 billion observations.

Kurt_Bremser
Super User

See here an application of temporary arrays to your data:

 

data product;
infile datalines delimiter=',';
input prod_id product_name $ prod_var1;
datalines;
1, aaa, 100
2, bbb, 250
3, ccc, 200
4, ddd, 100
5, ddd, 250
;

data customer;
infile datalines delimiter=',';
input cust_id cust_var1 cust_var2;
datalines;
1, 100, 2
2, 200, 5
3, 300, 5
;

proc sort data=product;
by descending prod_var1;
run;

proc sql noprint;
select nobs into :nobs
from dictionary.tables
where libname = "WORK" and memname = "PRODUCT";
quit;

data want;
set customer;
if _n_ = 1
then do;
  array p_id {&nobs.} _temporary_;
  array p_name {&nobs.} $8 _temporary_;
  array p_var1 {&nobs.} _temporary_;
  do i = 1 to &nobs.;
    set product;
    p_id{i} = prod_id;
    p_name{i} = product_name;
    p_var1{i} = prod_var1;
  end;
end;
do i = 1 to dim(p_id);
  if p_id{i} ne cust_var2 and cust_var1 le p_var1{i}
  then do;
    prod_id = p_id{i};
    product_name = p_name{i};
    prod_var1 = p_var1{i};
    output;
  end;
  if cust_var1 gt p_var1{i} then leave;
end;
drop i;
run;

proc sort data=want;
by cust_id prod_id;
run;

proc print data=want noobs;
run;

Result:

 

 

cust_id	cust_var1	cust_var2	prod_id	product_name	prod_var1
1	100	2	1	aaa	100
1	100	2	3	ccc	200
1	100	2	4	ddd	100
1	100	2	5	ddd	250
2	200	5	2	bbb	250
2	200	5	3	ccc	200

 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1065 views
  • 0 likes
  • 4 in conversation