I have a SAS code question, and I'd like to ask if there is a more efficient way to solve my problem.
Target:
Through the customer's product holding and the product combination datasets, generated the recommended product based on the product holding of each customer. It's like association model.
Method :
I have two datasets:
1st, product combination records
data AAA;
infile datalines missover;
input SET_SIZE COUNT ITEM1 $ ITEM2 $ ITEM3 $ ITEM4 $ ITEM5 $ ;
datalines;
2 50 DP1 DP2
2 40 DP2 DP3
2 39 AC1 AC2
2 30 AB1 AB2
3 30 DP1 DP2 DP3
3 20 AB1 AB2 DP1
3 20 AC1 AC2 AB1
4 10 DP1 DP2 DP3 AC1
5 10 AB1 AB2 AC1 DP1
;
2nd, the products held by the customer
data BBB;
infile datalines missover;
input ID $ count_of_product HOLD1 $ HOLD2 $ HOLD3 $ HOLD4 $ HOLD5 $ HOLD6 $ HOLD7 $ HOLD8 $ HOLD9 $
datalines;
ID01 3 DP1 DP2 AB1
ID02 4 DP1 DP2 DP3 DP4 AC1
ID03 4 DP1 AB1 AB2 AC1 AC2
ID04 3 AB1 AB2 AC1
ID05 2 AB1 DP1
ID06 8 AB1 AB2 DP1 DP2 DP3 AC1 AC2 GB1
;
I want to generate a dataset which record
1st, For each product combination, products held by customers and products not held by customers.
2nd, Calculate the number of products held by the customer in the product combination and the number of differentiation.
After the calculation, I sort by the number of differences between the customer and the product combination, and the count of the combination.Then, I delete duplicate records by ID and the recommended product.
Process of Implementation:
1. cartesian product (cross join)
PROC SQL ;
CREATE TABLE CCC AS
SELECT t1.*, t2.*
FROM AAA t1 , BBB t2;
QUIT;
2. Comparison and calculation
DATA DDD;
SET CCC;
ARRAY HOLD[9] $ HOLD1-HOLD9;
ARRAY ITEM[5] $ ITEM1-ITEM5;
ARRAY FLAG[5] FLAG1-FLAG5(0,0,0,0,0);
ARRAY PRODUCT[5] $34 PRODUCT1-PRODUCT5;
ARRAY INCLUD[5] $34 INCLUD1-INCLUD5;
INCLUDED = 0;
A=1;
B=1;
DO I=1 TO DIM(ITEM);
DO J=1 TO DIM(HOLD);
IF (ITEM[I]^="" AND ITEM[I]=HOLD[J]) THEN DO;
INCLUDED=INCLUDED+1;
FLAG[I]=1;
LEAVE;LEAVE;
END;
ELSE FLAG[I]=0;
END;
END;
DO K=1 TO DIM(FLAG);
IF FLAG[K]=0 THEN DO;
PRODUCT[A]=ITEM[K];
A=A+1;
END;
ELSE IF FLAG[K]=1 THEN DO;
INCLUD[B]=ITEM[K];
B=B+1;
END;
END;
OVERLAP = COUNT_OF_PRODUCT - INCLUDED;
GAP = SET_SIZE - INCLUDED;
RUN;
3. Sort and delete duplicates
PROC SQL ;
CREATE TABLE EEE AS
SELECT DISTINCT t1.ID,
t1.COUNT,
t1.PRODUCT1,
t1.INCLUD1,
t1.INCLUD2,
t1.INCLUD3,
t1.INCLUD4,
t1.INCLUD5
FROM DDD t1
WHERE t1.GAP = 1
ORDER BY t1.ID,
t1.OVERLAP,
t1.COUNT DESC;
QUIT;
proc sort data=EEE out=FFF nodupkey; by ID product1; run;
Question:
Cuz I need to do "cross join" and then "combination match". It takes lots of time. I'd like to ask if there is anyway (like hash table?but I don't know how to do) to shorten my process time. If you have any idea, please advise me. Many thanks. ^_^
Certainly you could create your own Cartesian product without SQL Given that you need a DATA step for the Cartesian product (the logic is probably too complex to implement with CASE statements instead), that ought to be faster. That let's you combine the first two steps. The idea:
data DDD;
set BBB;
do k=1 to n_combinations;
set AAA point=k nobs=n_combinations;
*** lots of processing, carefully done to mimic current DATA step, while overcoming that variables are retained;
output;
end;
run;
@Astounding wrote:
Certainly you could create your own Cartesian product without SQL Given that you need a DATA step for the Cartesian product (the logic is probably too complex to implement with CASE statements instead), that ought to be faster. That let's you combine the first two steps. The idea:
data DDD;
set BBB;
do k=1 to n_combinations;
set AAA point=k nobs=n_combinations;
*** lots of processing, carefully done to mimic current DATA step, while overcoming that variables are retained;
output;
end;
run;
Using SASFILE to load AAA would likely speed up this step.
sasfile aaa open;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.