BookmarkSubscribeRSS Feed
wenzli25
Calcite | Level 5

I have a SAS code question, and I'd like to ask if there is a more efficient way to solve my problem.

Target:
Through the customer's product holding and the product combination datasets, generated the recommended product based on the product holding of each customer. It's like association model.

Method :
I have two datasets:
1st, product combination records

 

data AAA;

infile datalines missover; 
input SET_SIZE COUNT ITEM1 $ ITEM2 $ ITEM3 $ ITEM4 $ ITEM5 $ ;
datalines;
2 50 DP1 DP2 
2 40 DP2 DP3
2 39 AC1 AC2
2 30 AB1 AB2
3 30 DP1 DP2 DP3
3 20 AB1 AB2 DP1
3 20 AC1 AC2 AB1
4 10 DP1 DP2 DP3 AC1
5 10 AB1 AB2 AC1 DP1
;


2nd, the products held by the customer

 

data BBB;

infile datalines missover; 
input ID $ count_of_product HOLD1 $ HOLD2 $ HOLD3 $ HOLD4 $ HOLD5 $ HOLD6 $ HOLD7 $ HOLD8 $ HOLD9 $ 

datalines;
ID01 3 DP1 DP2 AB1
ID02 4 DP1 DP2 DP3 DP4 AC1
ID03 4 DP1 AB1 AB2 AC1 AC2
ID04 3 AB1 AB2 AC1
ID05 2 AB1 DP1
ID06 8 AB1 AB2 DP1 DP2 DP3 AC1 AC2 GB1
;


I want to generate a dataset which record
1st, For each product combination, products held by customers and products not held by customers.
2nd, Calculate the number of products held by the customer in the product combination and the number of differentiation.

After the calculation, I sort by the number of differences between the customer and the product combination, and the count of the combination.Then, I delete duplicate records by ID and the recommended product.

Process of Implementation:

1. cartesian product (cross join)

 

PROC SQL ;		
   CREATE TABLE CCC AS 
SELECT t1.*, t2.*	
      FROM AAA t1 , BBB t2;
QUIT;			


2. Comparison and calculation

 

DATA DDD;			
	SET CCC;		
ARRAY HOLD[9] $ HOLD1-HOLD9;			
ARRAY ITEM[5] $ ITEM1-ITEM5;			
ARRAY FLAG[5] FLAG1-FLAG5(0,0,0,0,0);			
ARRAY PRODUCT[5] $34 PRODUCT1-PRODUCT5;			
ARRAY INCLUD[5] $34 INCLUD1-INCLUD5;			
INCLUDED = 0;			
A=1;			
B=1;			
DO I=1 TO DIM(ITEM);			
	DO J=1 TO DIM(HOLD);		
		IF (ITEM[I]^="" AND ITEM[I]=HOLD[J]) THEN DO;	
			INCLUDED=INCLUDED+1;
			FLAG[I]=1;
			LEAVE;LEAVE;
			END;
		ELSE FLAG[I]=0;	
	END;		
END;			
			
DO K=1 TO DIM(FLAG);			
IF FLAG[K]=0 THEN DO;			
	PRODUCT[A]=ITEM[K];		
	A=A+1;		
END;			
ELSE IF FLAG[K]=1 THEN DO;			
	INCLUD[B]=ITEM[K];		
	B=B+1;		
END;			
END;			
			
OVERLAP = COUNT_OF_PRODUCT - INCLUDED;			
GAP = SET_SIZE - INCLUDED;			
			
RUN;			


3. Sort and delete duplicates

PROC SQL ;	

   CREATE TABLE EEE AS 	
   SELECT DISTINCT t1.ID, 	
          t1.COUNT, 	
          t1.PRODUCT1, 	
          t1.INCLUD1, 	
          t1.INCLUD2, 	
          t1.INCLUD3, 	
          t1.INCLUD4, 	
          t1.INCLUD5	
      FROM DDD t1	
      WHERE t1.GAP = 1	
      ORDER BY t1.ID,	
               t1.OVERLAP,	
               t1.COUNT DESC;	
QUIT;	

proc sort data=EEE out=FFF nodupkey; by ID product1; run;


Question:

Cuz I need to do "cross join" and then "combination match". It takes lots of time. I'd like to ask if there is anyway (like hash table?but I don't know how to do) to shorten my process time. If you have any idea, please advise me. Many thanks. ^_^

 

3 REPLIES 3
Astounding
PROC Star

Certainly you could create your own Cartesian product without SQL  Given that you need a DATA step for the Cartesian product (the logic is probably too complex to implement with CASE statements instead), that ought to be faster.  That let's you combine the first two steps.  The idea:

 

data DDD;

set BBB;

do k=1 to n_combinations;

   set AAA point=k nobs=n_combinations;

   *** lots of processing, carefully done to mimic current DATA step, while overcoming that variables are retained;

   output;

end;

run;

data_null__
Jade | Level 19

@Astounding wrote:

Certainly you could create your own Cartesian product without SQL  Given that you need a DATA step for the Cartesian product (the logic is probably too complex to implement with CASE statements instead), that ought to be faster.  That let's you combine the first two steps.  The idea:

 

data DDD;

set BBB;

do k=1 to n_combinations;

   set AAA point=k nobs=n_combinations;

   *** lots of processing, carefully done to mimic current DATA step, while overcoming that variables are retained;

   output;

end;

run;


Using SASFILE to load AAA would likely speed up this step.

 

sasfile aaa open;
wenzli25
Calcite | Level 5
Sorry for the late reply. I tried @Astounding's method, but it failed. I'm not sure why.... still working on it. Could you tell me where should I modify "sasfile aaa open;" ?

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1472 views
  • 0 likes
  • 3 in conversation