I have a SAS code question, and I'd like to ask if there is a more efficient way to solve my problem. Target: Through the customer's product holding and the product combination datasets, generated the recommended product based on the product holding of each customer. It's like association model. Method : I have two datasets: 1st, product combination records data AAA;
infile datalines missover;
input SET_SIZE COUNT ITEM1 $ ITEM2 $ ITEM3 $ ITEM4 $ ITEM5 $ ;
datalines;
2 50 DP1 DP2
2 40 DP2 DP3
2 39 AC1 AC2
2 30 AB1 AB2
3 30 DP1 DP2 DP3
3 20 AB1 AB2 DP1
3 20 AC1 AC2 AB1
4 10 DP1 DP2 DP3 AC1
5 10 AB1 AB2 AC1 DP1
; 2nd, the products held by the customer data BBB;
infile datalines missover;
input ID $ count_of_product HOLD1 $ HOLD2 $ HOLD3 $ HOLD4 $ HOLD5 $ HOLD6 $ HOLD7 $ HOLD8 $ HOLD9 $
datalines;
ID01 3 DP1 DP2 AB1
ID02 4 DP1 DP2 DP3 DP4 AC1
ID03 4 DP1 AB1 AB2 AC1 AC2
ID04 3 AB1 AB2 AC1
ID05 2 AB1 DP1
ID06 8 AB1 AB2 DP1 DP2 DP3 AC1 AC2 GB1
; I want to generate a dataset which record 1st, For each product combination, products held by customers and products not held by customers. 2nd, Calculate the number of products held by the customer in the product combination and the number of differentiation. After the calculation, I sort by the number of differences between the customer and the product combination, and the count of the combination.Then, I delete duplicate records by ID and the recommended product. Process of Implementation: 1. cartesian product (cross join) PROC SQL ;
CREATE TABLE CCC AS
SELECT t1.*, t2.*
FROM AAA t1 , BBB t2;
QUIT; 2. Comparison and calculation DATA DDD;
SET CCC;
ARRAY HOLD[9] $ HOLD1-HOLD9;
ARRAY ITEM[5] $ ITEM1-ITEM5;
ARRAY FLAG[5] FLAG1-FLAG5(0,0,0,0,0);
ARRAY PRODUCT[5] $34 PRODUCT1-PRODUCT5;
ARRAY INCLUD[5] $34 INCLUD1-INCLUD5;
INCLUDED = 0;
A=1;
B=1;
DO I=1 TO DIM(ITEM);
DO J=1 TO DIM(HOLD);
IF (ITEM[I]^="" AND ITEM[I]=HOLD[J]) THEN DO;
INCLUDED=INCLUDED+1;
FLAG[I]=1;
LEAVE;LEAVE;
END;
ELSE FLAG[I]=0;
END;
END;
DO K=1 TO DIM(FLAG);
IF FLAG[K]=0 THEN DO;
PRODUCT[A]=ITEM[K];
A=A+1;
END;
ELSE IF FLAG[K]=1 THEN DO;
INCLUD[B]=ITEM[K];
B=B+1;
END;
END;
OVERLAP = COUNT_OF_PRODUCT - INCLUDED;
GAP = SET_SIZE - INCLUDED;
RUN; 3. Sort and delete duplicates PROC SQL ;
CREATE TABLE EEE AS
SELECT DISTINCT t1.ID,
t1.COUNT,
t1.PRODUCT1,
t1.INCLUD1,
t1.INCLUD2,
t1.INCLUD3,
t1.INCLUD4,
t1.INCLUD5
FROM DDD t1
WHERE t1.GAP = 1
ORDER BY t1.ID,
t1.OVERLAP,
t1.COUNT DESC;
QUIT;
proc sort data=EEE out=FFF nodupkey; by ID product1; run; Question: Cuz I need to do "cross join" and then "combination match". It takes lots of time. I'd like to ask if there is anyway (like hash table?but I don't know how to do) to shorten my process time. If you have any idea, please advise me. Many thanks. ^_^
... View more