<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Another efficient method instead of cartesian product (cross join) in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Another-efficient-method-instead-of-cartesian-product-cross-join/m-p/448087#M112675</link>
    <description>&lt;P&gt;I have a SAS code question, and I'd like to ask if there is a more efficient way to solve my problem.&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;Target:&lt;/STRONG&gt;&lt;BR /&gt;Through the customer's product holding and the product combination datasets, generated the recommended product based on the product holding of each customer. It's like association model.&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;Method :&lt;/STRONG&gt;&lt;BR /&gt;I have two datasets:&lt;BR /&gt;1st, product combination records&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class="language-sas"&gt;data AAA;

infile datalines missover; 
input SET_SIZE COUNT ITEM1 $ ITEM2 $ ITEM3 $ ITEM4 $ ITEM5 $ ;
datalines;
2 50 DP1 DP2 
2 40 DP2 DP3
2 39 AC1 AC2
2 30 AB1 AB2
3 30 DP1 DP2 DP3
3 20 AB1 AB2 DP1
3 20 AC1 AC2 AB1
4 10 DP1 DP2 DP3 AC1
5 10 AB1 AB2 AC1 DP1
;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;BR /&gt;2nd, the products held by the customer&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class="language-sas"&gt;data BBB;

infile datalines missover; 
input ID $ count_of_product HOLD1 $ HOLD2 $ HOLD3 $ HOLD4 $ HOLD5 $ HOLD6 $ HOLD7 $ HOLD8 $ HOLD9 $ 

datalines;
ID01 3 DP1 DP2 AB1
ID02 4 DP1 DP2 DP3 DP4 AC1
ID03 4 DP1 AB1 AB2 AC1 AC2
ID04 3 AB1 AB2 AC1
ID05 2 AB1 DP1
ID06 8 AB1 AB2 DP1 DP2 DP3 AC1 AC2 GB1
;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;BR /&gt;I want to generate a dataset which record&lt;BR /&gt;1st, For each product combination, products held by customers and products not held by customers.&lt;BR /&gt;2nd, Calculate the number of products held by the customer in the product combination and the number of differentiation.&lt;BR /&gt;&lt;BR /&gt;After the calculation, I sort by the number of differences between the customer and the product combination, and the count of the combination.Then, I delete duplicate records by ID and the recommended product.&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;Process of Implementation:&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;1. cartesian product (cross join)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class="language-sas"&gt;PROC SQL ;		
   CREATE TABLE CCC AS 
SELECT t1.*, t2.*	
      FROM AAA t1 , BBB t2;
QUIT;			&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;BR /&gt;2. Comparison and calculation&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class="language-sas"&gt;DATA DDD;			
	SET CCC;		
ARRAY HOLD[9] $ HOLD1-HOLD9;			
ARRAY ITEM[5] $ ITEM1-ITEM5;			
ARRAY FLAG[5] FLAG1-FLAG5(0,0,0,0,0);			
ARRAY PRODUCT[5] $34 PRODUCT1-PRODUCT5;			
ARRAY INCLUD[5] $34 INCLUD1-INCLUD5;			
INCLUDED = 0;			
A=1;			
B=1;			
DO I=1 TO DIM(ITEM);			
	DO J=1 TO DIM(HOLD);		
		IF (ITEM[I]^="" AND ITEM[I]=HOLD[J]) THEN DO;	
			INCLUDED=INCLUDED+1;
			FLAG[I]=1;
			LEAVE;LEAVE;
			END;
		ELSE FLAG[I]=0;	
	END;		
END;			
			
DO K=1 TO DIM(FLAG);			
IF FLAG[K]=0 THEN DO;			
	PRODUCT[A]=ITEM[K];		
	A=A+1;		
END;			
ELSE IF FLAG[K]=1 THEN DO;			
	INCLUD[B]=ITEM[K];		
	B=B+1;		
END;			
END;			
			
OVERLAP = COUNT_OF_PRODUCT - INCLUDED;			
GAP = SET_SIZE - INCLUDED;			
			
RUN;			&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;BR /&gt;3. Sort and delete duplicates&lt;/P&gt;&lt;PRE&gt;&lt;CODE class="language-sas"&gt;PROC SQL ;	

   CREATE TABLE EEE AS 	
   SELECT DISTINCT t1.ID, 	
          t1.COUNT, 	
          t1.PRODUCT1, 	
          t1.INCLUD1, 	
          t1.INCLUD2, 	
          t1.INCLUD3, 	
          t1.INCLUD4, 	
          t1.INCLUD5	
      FROM DDD t1	
      WHERE t1.GAP = 1	
      ORDER BY t1.ID,	
               t1.OVERLAP,	
               t1.COUNT DESC;	
QUIT;	

&lt;/CODE&gt;&lt;/PRE&gt;&lt;PRE&gt;&lt;CODE class="language-sas"&gt;proc sort data=EEE out=FFF nodupkey; by ID product1; run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;BR /&gt;Question:&lt;BR /&gt;&lt;BR /&gt;Cuz I need to do "cross join" and then "combination match". It takes lots of time. I'd like to ask if there is anyway (like hash table?but I don't know how to do) to shorten my process time. If you have any idea, please advise me. Many thanks. ^_^&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 23 Mar 2018 09:19:24 GMT</pubDate>
    <dc:creator>wenzli25</dc:creator>
    <dc:date>2018-03-23T09:19:24Z</dc:date>
    <item>
      <title>Another efficient method instead of cartesian product (cross join)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Another-efficient-method-instead-of-cartesian-product-cross-join/m-p/448087#M112675</link>
      <description>&lt;P&gt;I have a SAS code question, and I'd like to ask if there is a more efficient way to solve my problem.&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;Target:&lt;/STRONG&gt;&lt;BR /&gt;Through the customer's product holding and the product combination datasets, generated the recommended product based on the product holding of each customer. It's like association model.&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;Method :&lt;/STRONG&gt;&lt;BR /&gt;I have two datasets:&lt;BR /&gt;1st, product combination records&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class="language-sas"&gt;data AAA;

infile datalines missover; 
input SET_SIZE COUNT ITEM1 $ ITEM2 $ ITEM3 $ ITEM4 $ ITEM5 $ ;
datalines;
2 50 DP1 DP2 
2 40 DP2 DP3
2 39 AC1 AC2
2 30 AB1 AB2
3 30 DP1 DP2 DP3
3 20 AB1 AB2 DP1
3 20 AC1 AC2 AB1
4 10 DP1 DP2 DP3 AC1
5 10 AB1 AB2 AC1 DP1
;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;BR /&gt;2nd, the products held by the customer&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class="language-sas"&gt;data BBB;

infile datalines missover; 
input ID $ count_of_product HOLD1 $ HOLD2 $ HOLD3 $ HOLD4 $ HOLD5 $ HOLD6 $ HOLD7 $ HOLD8 $ HOLD9 $ 

datalines;
ID01 3 DP1 DP2 AB1
ID02 4 DP1 DP2 DP3 DP4 AC1
ID03 4 DP1 AB1 AB2 AC1 AC2
ID04 3 AB1 AB2 AC1
ID05 2 AB1 DP1
ID06 8 AB1 AB2 DP1 DP2 DP3 AC1 AC2 GB1
;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;BR /&gt;I want to generate a dataset which record&lt;BR /&gt;1st, For each product combination, products held by customers and products not held by customers.&lt;BR /&gt;2nd, Calculate the number of products held by the customer in the product combination and the number of differentiation.&lt;BR /&gt;&lt;BR /&gt;After the calculation, I sort by the number of differences between the customer and the product combination, and the count of the combination.Then, I delete duplicate records by ID and the recommended product.&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;Process of Implementation:&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;1. cartesian product (cross join)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class="language-sas"&gt;PROC SQL ;		
   CREATE TABLE CCC AS 
SELECT t1.*, t2.*	
      FROM AAA t1 , BBB t2;
QUIT;			&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;BR /&gt;2. Comparison and calculation&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class="language-sas"&gt;DATA DDD;			
	SET CCC;		
ARRAY HOLD[9] $ HOLD1-HOLD9;			
ARRAY ITEM[5] $ ITEM1-ITEM5;			
ARRAY FLAG[5] FLAG1-FLAG5(0,0,0,0,0);			
ARRAY PRODUCT[5] $34 PRODUCT1-PRODUCT5;			
ARRAY INCLUD[5] $34 INCLUD1-INCLUD5;			
INCLUDED = 0;			
A=1;			
B=1;			
DO I=1 TO DIM(ITEM);			
	DO J=1 TO DIM(HOLD);		
		IF (ITEM[I]^="" AND ITEM[I]=HOLD[J]) THEN DO;	
			INCLUDED=INCLUDED+1;
			FLAG[I]=1;
			LEAVE;LEAVE;
			END;
		ELSE FLAG[I]=0;	
	END;		
END;			
			
DO K=1 TO DIM(FLAG);			
IF FLAG[K]=0 THEN DO;			
	PRODUCT[A]=ITEM[K];		
	A=A+1;		
END;			
ELSE IF FLAG[K]=1 THEN DO;			
	INCLUD[B]=ITEM[K];		
	B=B+1;		
END;			
END;			
			
OVERLAP = COUNT_OF_PRODUCT - INCLUDED;			
GAP = SET_SIZE - INCLUDED;			
			
RUN;			&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;BR /&gt;3. Sort and delete duplicates&lt;/P&gt;&lt;PRE&gt;&lt;CODE class="language-sas"&gt;PROC SQL ;	

   CREATE TABLE EEE AS 	
   SELECT DISTINCT t1.ID, 	
          t1.COUNT, 	
          t1.PRODUCT1, 	
          t1.INCLUD1, 	
          t1.INCLUD2, 	
          t1.INCLUD3, 	
          t1.INCLUD4, 	
          t1.INCLUD5	
      FROM DDD t1	
      WHERE t1.GAP = 1	
      ORDER BY t1.ID,	
               t1.OVERLAP,	
               t1.COUNT DESC;	
QUIT;	

&lt;/CODE&gt;&lt;/PRE&gt;&lt;PRE&gt;&lt;CODE class="language-sas"&gt;proc sort data=EEE out=FFF nodupkey; by ID product1; run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;BR /&gt;Question:&lt;BR /&gt;&lt;BR /&gt;Cuz I need to do "cross join" and then "combination match". It takes lots of time. I'd like to ask if there is anyway (like hash table?but I don't know how to do) to shorten my process time. If you have any idea, please advise me. Many thanks. ^_^&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 23 Mar 2018 09:19:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Another-efficient-method-instead-of-cartesian-product-cross-join/m-p/448087#M112675</guid>
      <dc:creator>wenzli25</dc:creator>
      <dc:date>2018-03-23T09:19:24Z</dc:date>
    </item>
    <item>
      <title>Re: Another efficient method instead of cartesian product (cross join)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Another-efficient-method-instead-of-cartesian-product-cross-join/m-p/448107#M112685</link>
      <description>&lt;P&gt;Certainly you could create your own Cartesian product without SQL&amp;nbsp; Given that you need a DATA step for the Cartesian product (the logic is probably too complex to implement with CASE statements instead), that ought to be faster.&amp;nbsp; That let's you combine the first two steps.&amp;nbsp; The idea:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data DDD;&lt;/P&gt;
&lt;P&gt;set BBB;&lt;/P&gt;
&lt;P&gt;do k=1 to n_combinations;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;set AAA point=k nobs=n_combinations;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;*** lots of processing, carefully done to mimic current DATA step, while overcoming that variables are retained;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;output;&lt;/P&gt;
&lt;P&gt;end;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;</description>
      <pubDate>Fri, 23 Mar 2018 12:30:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Another-efficient-method-instead-of-cartesian-product-cross-join/m-p/448107#M112685</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2018-03-23T12:30:15Z</dc:date>
    </item>
    <item>
      <title>Re: Another efficient method instead of cartesian product (cross join)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Another-efficient-method-instead-of-cartesian-product-cross-join/m-p/448157#M112706</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/4954"&gt;@Astounding&lt;/a&gt; wrote:&lt;BR /&gt;
&lt;P&gt;Certainly you could create your own Cartesian product without SQL&amp;nbsp; Given that you need a DATA step for the Cartesian product (the logic is probably too complex to implement with CASE statements instead), that ought to be faster.&amp;nbsp; That let's you combine the first two steps.&amp;nbsp; The idea:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data DDD;&lt;/P&gt;
&lt;P&gt;set BBB;&lt;/P&gt;
&lt;P&gt;do k=1 to n_combinations;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;set AAA point=k nobs=n_combinations;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;*** lots of processing, carefully done to mimic current DATA step, while overcoming that variables are retained;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;output;&lt;/P&gt;
&lt;P&gt;end;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Using SASFILE to load AAA would likely speed up this step.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;sasfile aaa open;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 23 Mar 2018 13:52:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Another-efficient-method-instead-of-cartesian-product-cross-join/m-p/448157#M112706</guid>
      <dc:creator>data_null__</dc:creator>
      <dc:date>2018-03-23T13:52:05Z</dc:date>
    </item>
    <item>
      <title>Re: Another efficient method instead of cartesian product (cross join)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Another-efficient-method-instead-of-cartesian-product-cross-join/m-p/464355#M118380</link>
      <description>Sorry for the late reply. I tried &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/4954"&gt;@Astounding&lt;/a&gt;'s method, but it failed. I'm not sure why.... still working on it. Could you tell me where should I modify "sasfile aaa open;" ?</description>
      <pubDate>Wed, 23 May 2018 12:59:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Another-efficient-method-instead-of-cartesian-product-cross-join/m-p/464355#M118380</guid>
      <dc:creator>wenzli25</dc:creator>
      <dc:date>2018-05-23T12:59:58Z</dc:date>
    </item>
  </channel>
</rss>

