<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Cartesian Product using the HASH in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778859#M247981</link>
    <description>&lt;P&gt;if you need cartesian product in a datastep you can do it without using hash:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data x;
input x;
cards;
1
2
3
;
run;

data y;
input y;
cards;
10
20
30
;
run;

data x_times_y;
  set x;
  do point= 1 to nobs;
    set y nobs=nobs point=point;
    output;
  end;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Bart&lt;/P&gt;</description>
    <pubDate>Fri, 05 Nov 2021 18:12:27 GMT</pubDate>
    <dc:creator>yabwon</dc:creator>
    <dc:date>2021-11-05T18:12:27Z</dc:date>
    <item>
      <title>Cartesian Product using the HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778826#M247965</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;Can someone provide me a code for a cartesian product using HASH!&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Venkat.&lt;/P&gt;</description>
      <pubDate>Fri, 05 Nov 2021 16:18:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778826#M247965</guid>
      <dc:creator>venkibhu14</dc:creator>
      <dc:date>2021-11-05T16:18:20Z</dc:date>
    </item>
    <item>
      <title>Re: Cartesian Product using the HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778829#M247967</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;See my solution to this question :&lt;/P&gt;
&lt;P&gt;Macro is taking a long time to run - Risk set sampling&lt;BR /&gt;Posted 03-24-2021 11:44 AM &lt;BR /&gt;&lt;A href="https://communities.sas.com/t5/SAS-Programming/Macro-is-taking-a-long-time-to-run-Risk-set-sampling/m-p/728788#M226755" target="_blank"&gt;https://communities.sas.com/t5/SAS-Programming/Macro-is-taking-a-long-time-to-run-Risk-set-sampling/m-p/728788#M226755&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm doing a Cartesian product over there using a hash table (instead of the 'traditional' SQL-way to do the same).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Cheers,&lt;BR /&gt;Koen&lt;/P&gt;</description>
      <pubDate>Fri, 05 Nov 2021 16:23:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778829#M247967</guid>
      <dc:creator>sbxkoenk</dc:creator>
      <dc:date>2021-11-05T16:23:13Z</dc:date>
    </item>
    <item>
      <title>Re: Cartesian Product using the HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778847#M247973</link>
      <description>&lt;P&gt;Hi &lt;SPAN&gt;Koen,&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Thanks a lot for a quick reply. Its working...:)&lt;/P&gt;</description>
      <pubDate>Fri, 05 Nov 2021 17:12:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778847#M247973</guid>
      <dc:creator>venkibhu14</dc:creator>
      <dc:date>2021-11-05T17:12:40Z</dc:date>
    </item>
    <item>
      <title>Re: Cartesian Product using the HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778859#M247981</link>
      <description>&lt;P&gt;if you need cartesian product in a datastep you can do it without using hash:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data x;
input x;
cards;
1
2
3
;
run;

data y;
input y;
cards;
10
20
30
;
run;

data x_times_y;
  set x;
  do point= 1 to nobs;
    set y nobs=nobs point=point;
    output;
  end;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Bart&lt;/P&gt;</description>
      <pubDate>Fri, 05 Nov 2021 18:12:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778859#M247981</guid>
      <dc:creator>yabwon</dc:creator>
      <dc:date>2021-11-05T18:12:27Z</dc:date>
    </item>
    <item>
      <title>Re: Cartesian Product using the HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778861#M247983</link>
      <description>&lt;P&gt;Another Hash Approach&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data x;
input x;
cards;
1
2
3
;
run;

data y;
input y;
cards;
10
20
30
;
run;

data want;
   if _N_ = 1 then do;
      dcl hash h(dataset : "y");
      h.definekey("y");
      h.definedata(all : "Y");
      h.definedone();
      dcl hiter i("h");
   end;
   
   set x;
   y = .;
   
   do while (i.next() = 0);
      output;
   end;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 05 Nov 2021 18:43:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778861#M247983</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2021-11-05T18:43:24Z</dc:date>
    </item>
    <item>
      <title>Re: Cartesian Product using the HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778870#M247987</link>
      <description>&lt;P&gt;why not just:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;   if _N_ = 1 then do;
      if 0 then set y;
      dcl hash h(dataset : "y");
      h.defineKey(all : "Y");
      h.defineDone();
      dcl hiter i("h");
   end;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Bart&lt;/P&gt;</description>
      <pubDate>Fri, 05 Nov 2021 19:21:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778870#M247987</guid>
      <dc:creator>yabwon</dc:creator>
      <dc:date>2021-11-05T19:21:53Z</dc:date>
    </item>
    <item>
      <title>Re: Cartesian Product using the HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778873#M247990</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/35763"&gt;@yabwon&lt;/a&gt;&amp;nbsp;Just figured if y had more variables. Obviously you'd have to prepare the PDV &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hope everything is well.&lt;/P&gt;</description>
      <pubDate>Fri, 05 Nov 2021 19:30:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778873#M247990</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2021-11-05T19:30:51Z</dc:date>
    </item>
    <item>
      <title>Re: Cartesian Product using the HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778884#M247994</link>
      <description>&lt;P&gt;One is the PDV, but the second is that you don't need defineData() and save some memory.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;B-)&lt;/P&gt;</description>
      <pubDate>Fri, 05 Nov 2021 21:04:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778884#M247994</guid>
      <dc:creator>yabwon</dc:creator>
      <dc:date>2021-11-05T21:04:19Z</dc:date>
    </item>
    <item>
      <title>Re: Cartesian Product using the HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778886#M247995</link>
      <description>&lt;P&gt;You don't save memory. If definedata() is not used, the key list is used to fill the data portion. One of the many ways SAS hash tables waste memory.&lt;/P&gt;</description>
      <pubDate>Fri, 05 Nov 2021 21:15:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778886#M247995</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2021-11-05T21:15:24Z</dc:date>
    </item>
    <item>
      <title>Re: Cartesian Product using the HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778905#M248004</link>
      <description>&lt;P&gt;Hi Chris (&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/16961"&gt;@ChrisNZ&lt;/a&gt;),&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Yes, you are absolutely right. Sorry for my mistake.&lt;/P&gt;
&lt;P&gt;I read about this memory "issue" in Paul Dorfman's(&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/21262"&gt;@hashman&lt;/a&gt;) and Don Henderson's&amp;nbsp; (&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13569"&gt;@DonH&lt;/a&gt;)&amp;nbsp; book but yesterday, when I wrote that one, I "mixed directions" in my head and didn't do the test to confirm (I did test below so others could see it too).&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Bottom lines:&lt;/P&gt;
&lt;P&gt;1) Peter's solution is more optimal (&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31304"&gt;@PeterClemmensen&lt;/a&gt;),&lt;/P&gt;
&lt;P&gt;2) if you need a hash table only for .check()-ing and have a long key then put one singe variable as data portion, e.g. _N_ to save some memory (shorter than 8 bytes won't help [last two examples below]).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Bart&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;[edit] P.S. It seems to be a good one for a SAS Ballot Idea: "optimise hash table memory usage".&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;1    options msglevel=I fullstimer;
2    data test;
3      do x = 1 to 1e7;
4        y = x;
5        z = x;
6        t = "a";
7        output;
8      end;
9    run;

NOTE: The data set WORK.TEST has 10000000 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           0.38 seconds
      user cpu time       0.23 seconds
      system cpu time     0.15 seconds
      memory              394.50k
      OS Memory           12532.00k

10
11
12   data _null_;
13     if 0 then set test;
14     dcl hash h(dataset : "test");
15     h.defineKey(all : "Y"); /* no data portion */
16     h.defineDone();
17     dcl hiter i("h");
18
19     stop;
20   run;

NOTE: There were 10000000 observations read from the data set WORK.TEST.
NOTE: DATA statement used (Total process time):
      real time           4.23 seconds
      user cpu time       3.70 seconds
      system cpu time     0.51 seconds
      memory              1180490.53k
      OS Memory           1191940.00k

21
22   data _null_;
23     if 0 then set test;
24     dcl hash h(dataset : "test");
25     h.defineKey("x");
26     h.defineData(all : "Y");
27     h.defineDone();
28     dcl hiter i("h");
29
30     stop;
31   run;

NOTE: There were 10000000 observations read from the data set WORK.TEST.
NOTE: DATA statement used (Total process time):
      real time           3.86 seconds
      user cpu time       3.43 seconds
      system cpu time     0.43 seconds
      memory              983883.81k
      OS Memory           995396.00k

32
33
34   data _null_;
35     if 0 then set test;
36     dcl hash h(dataset : "test");
37     h.defineKey(all : "Y");
38     h.defineData("z"); /* z is 8 bytes */
39     h.defineDone();
40
41     stop;
42   run;

NOTE: There were 10000000 observations read from the data set WORK.TEST.
NOTE: DATA statement used (Total process time):
      real time           3.90 seconds
      user cpu time       3.51 seconds
      system cpu time     0.39 seconds
      memory              983857.75k
      OS Memory           995396.00k

43
44   data _null_;
45     if 0 then set test;
46     dcl hash h(dataset : "test");
47     h.defineKey(all : "Y");
48     h.defineData("t"); /* t is 1 byte */
49     h.defineDone();
50
51     stop;
52   run;

NOTE: There were 10000000 observations read from the data set WORK.TEST.
NOTE: DATA statement used (Total process time):
      real time           3.79 seconds
      user cpu time       3.45 seconds
      system cpu time     0.34 seconds
      memory              983857.75k
      OS Memory           995396.00k
&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 06 Nov 2021 07:30:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778905#M248004</guid>
      <dc:creator>yabwon</dc:creator>
      <dc:date>2021-11-06T07:30:01Z</dc:date>
    </item>
    <item>
      <title>Re: Cartesian Product using the HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778906#M248005</link>
      <description>&lt;P&gt;Nice!&amp;nbsp; Have you compared to proc sql?&lt;/P&gt;</description>
      <pubDate>Sat, 06 Nov 2021 07:40:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778906#M248005</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2021-11-06T07:40:25Z</dc:date>
    </item>
    <item>
      <title>Re: Cartesian Product using the HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778908#M248007</link>
      <description>&lt;P&gt;Do you mean:&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;select x.*,y.* from x,y;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;vs. data step with hash, vs. data step with point?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Bart&lt;/P&gt;</description>
      <pubDate>Sat, 06 Nov 2021 07:55:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778908#M248007</guid>
      <dc:creator>yabwon</dc:creator>
      <dc:date>2021-11-06T07:55:20Z</dc:date>
    </item>
    <item>
      <title>Re: Cartesian Product using the HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778918#M248012</link>
      <description>&lt;P&gt;Tests for small tables (10K obs,) are below.&amp;nbsp;I don't have space for bigger sets on my laptop.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In terms of "real time" it is: 1) SQL (even done by SQXJSL = step loop join) , 2) hash, 3) point (even when whole table is in ram).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In terms of "memory" it is: 1) point, 2) hash, 3) SQL.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In terms of "OS memory" everything was less than 30MB so...&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Bart&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;1    options msglevel=I fullstimer;
2    data x;
3      do x = 1 to 1e4;
4        x1 = x;
5        x2 = x;
6        x3 = "a";
7        output;
8      end;
9    run;

NOTE: The data set WORK.X has 10000 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      user cpu time       0.00 seconds
      system cpu time     0.00 seconds
      memory              395.03k
      OS Memory           24060.00k

10
11   data y;
12     do y = 1 to 1e4;
13       y1 = y;
14       y2 = y;
15       y3 = "b";
16       output;
17     end;
18   run;

NOTE: The data set WORK.Y has 10000 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      user cpu time       0.01 seconds
      system cpu time     0.00 seconds
      memory              395.03k
      OS Memory           24060.00k

19
20   proc sql;
21   create table test1 as
22   select x.*, y.* from x,y
23   ;
NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized.
NOTE: Table WORK.TEST1 created, with 100000000 rows and 8 columns.

24   quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           8.60 seconds
      user cpu time       6.51 seconds
      system cpu time     1.82 seconds
      memory              5633.68k
      OS Memory           29184.00k

25   data test2;
26     set x;
27     do point= 1 to nobs;
28       set y nobs=nobs point=point;
29       output;
30     end;
31   run;

NOTE: There were 10000 observations read from the data set WORK.X.
NOTE: The data set WORK.TEST2 has 100000000 observations and 8 variables.
NOTE: DATA statement used (Total process time):
      real time           22.25 seconds
      user cpu time       15.43 seconds
      system cpu time     6.62 seconds
      memory              930.18k
      OS Memory           24572.00k

32   data test3;
33      if _N_ = 1 then do;
34         if 0 then set y;
35         dcl hash h(dataset : "y");
36         h.definekey("y");
37         h.definedata(all : "Y");
38         h.definedone();
39         dcl hiter i("h");
40      end;
41
42      set x;
43
44      do while (i.next() = 0);
45         output;
46      end;
47   run;

NOTE: There were 10000 observations read from the data set WORK.Y.
NOTE: There were 10000 observations read from the data set WORK.X.
NOTE: The data set WORK.TEST3 has 100000000 observations and 8 variables.
NOTE: DATA statement used (Total process time):
      real time           13.32 seconds
      user cpu time       11.21 seconds
      system cpu time     2.03 seconds
      memory              2446.78k
      OS Memory           26068.00k

48   sasfile y load;
NOTE: The file WORK.Y.DATA has been loaded into memory by the SASFILE statement.
49   data test4;
50     set x;
51     do point= 1 to nobs;
52       set y nobs=nobs point=point;
53       output;
54     end;
55   run;

NOTE: There were 10000 observations read from the data set WORK.X.
NOTE: The data set WORK.TEST4 has 100000000 observations and 8 variables.
NOTE: DATA statement used (Total process time):
      real time           22.38 seconds
      user cpu time       16.56 seconds
      system cpu time     5.81 seconds
      memory              713.75k
      OS Memory           25408.00k

56   sasfile y close;
NOTE: The file WORK.Y.DATA has been closed by the SASFILE statement.
&lt;/PRE&gt;
&lt;P&gt;SQL "under the hood":&lt;/P&gt;
&lt;PRE&gt;1    proc sql feedback _tree _method;
2    create table test1 as
3    select x.*, y.* from x,y
4    ;
NOTE: Statement transforms to:

        select X.x, X.x1, X.x2, X.x3, Y.y, Y.y1, Y.y2, Y.y3
          from WORK.X, WORK.Y;

NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized.

NOTE: SQL execution methods chosen are:

      sqxcrta
          sqxjsl
              sqxsrc( WORK.X )
              sqxsrc( WORK.Y )

Tree as planned.
                               /-SYM-V-(x.x:1 flag=00000001)
                     /-OBJ----|
                    |         |--SYM-V-(x.x1:2 flag=00000001)
                    |         |--SYM-V-(x.x2:3 flag=00000001)
                    |         |--SYM-V-(x.x3:4 flag=00000001)
                    |         |--SYM-V-(y.y:1 flag=00000001)
                    |         |--SYM-V-(y.y1:2 flag=00000001)
                    |         |--SYM-V-(y.y2:3 flag=00000001)
                    |          \-SYM-V-(y.y3:4 flag=00000001)
           /-JOIN---|
          |         |                              /-SYM-V-(x.x:1 flag=00000001)
          |         |                    /-OBJ----|
          |         |                   |         |--SYM-V-(x.x1:2 flag=00000001)
          |         |                   |         |--SYM-V-(x.x2:3 flag=00000001)
          |         |                   |          \-SYM-V-(x.x3:4 flag=00000001)
          |         |          /-SRC----|
          |         |         |          \-TABL[WORK].x opt=''
          |          \-FROM---|
          |                   |                    /-SYM-V-(y.y:1 flag=00000001)
          |                   |          /-OBJ----|
          |                   |         |         |--SYM-V-(y.y1:2 flag=00000001)
          |                   |         |         |--SYM-V-(y.y2:3 flag=00000001)
          |                   |         |          \-SYM-V-(y.y3:4 flag=00000001)
          |                    \-SRC----|
          |                              \-TABL[WORK].y opt=''
 --SSEL---|


NOTE: Table WORK.TEST1 created, with 100000000 rows and 8 columns.

5    quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           8.52 seconds
      user cpu time       6.43 seconds
      system cpu time     1.87 seconds
      memory              5647.87k
      OS Memory           30464.00k
&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 06 Nov 2021 09:43:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778918#M248012</guid>
      <dc:creator>yabwon</dc:creator>
      <dc:date>2021-11-06T09:43:09Z</dc:date>
    </item>
    <item>
      <title>Re: Cartesian Product using the HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778922#M248013</link>
      <description>Peter.C&lt;BR /&gt;You need consider duplicated key as well !&lt;BR /&gt;&lt;BR /&gt;   dcl hash h(dataset : "y",   multidata:'y'  );</description>
      <pubDate>Sat, 06 Nov 2021 10:39:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778922#M248013</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2021-11-06T10:39:29Z</dc:date>
    </item>
    <item>
      <title>Re: Cartesian Product using the HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778935#M248021</link>
      <description>&lt;P&gt;Correct. And there is also a minimum size for the combination of the key and data portions of the table. IIRC that size also depends on the OS. I don’t remember the details, but I’m sure&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/21262"&gt;@hashman&lt;/a&gt;&amp;nbsp;does &lt;span class="lia-unicode-emoji" title=":grinning_face:"&gt;😀&lt;/span&gt;.&lt;/P&gt;</description>
      <pubDate>Sat, 06 Nov 2021 15:03:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778935#M248021</guid>
      <dc:creator>DonH</dc:creator>
      <dc:date>2021-11-06T15:03:57Z</dc:date>
    </item>
    <item>
      <title>Re: Cartesian Product using the HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778952#M248028</link>
      <description>&lt;P&gt;The size depends on the architecture (32 or 64 bits).&lt;/P&gt;
&lt;P&gt;On 64-bit platforms, 48 bytes per item minimum (including 8 bytes each for key and data), and item increments of 16 bytes, with 8 byte increment for each of key and data.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 07 Nov 2021 00:58:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778952#M248028</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2021-11-07T00:58:19Z</dc:date>
    </item>
    <item>
      <title>Re: Cartesian Product using the HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778953#M248029</link>
      <description>&lt;P&gt;Very comprehensive benchmark, thank you. I'd have been disappointed if SQL was slower.&lt;/P&gt;</description>
      <pubDate>Sun, 07 Nov 2021 00:59:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778953#M248029</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2021-11-07T00:59:34Z</dc:date>
    </item>
    <item>
      <title>Re: Cartesian Product using the HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778959#M248030</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/35763"&gt;@yabwon&lt;/a&gt;&amp;nbsp;/&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/16961"&gt;@ChrisNZ&lt;/a&gt;&amp;nbsp;/&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13569"&gt;@DonH&lt;/a&gt;:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Forming a CP is a matter of reading X one record at a time and pairing it with each record from Y sequentially. Sequential read is a not POINT='s forte, nor it is a hash object's forte, as the iterator has a significant underlying software overhead. Both rather excel at searching for a given obs number or key-value. The most natural way of speeding up repeated sequential scans is a temp array because it reads from memory and does it fast. Hence, for example - using Bart's test data (and not worrying about hardcoding):&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data test_arr ;                                                                                                                                                                                                                                                 
  array ya  [10000]    _temporary_ ;                                                                                                                                                                                                                            
  array y1a [10000]    _temporary_ ;                                                                                                                                                                                                                            
  array y2a [10000]    _temporary_ ;                                                                                                                                                                                                                            
  array y3a [10000] $1 _temporary_ ;                                                                                                                                                                                                                            
  if _n_ = 1 then do until (z) ;                                                                                                                                                                                                                                
    set y end = z;                                                                                                                                                                                                                                              
    ya [_n_] = y  ;                                                                                                                                                                                                                                             
    y1a[_n_] = y1 ;                                                                                                                                                                                                                                             
    y2a[_n_] = y2 ;                                                                                                                                                                                                                                             
    y3a[_n_] = y3 ;                                                                                                                                                                                                                                             
  end ;                                                                                                                                                                                                                                                         
  set x ;                                                                                                                                                                                                                                                       
  do _n_ = 1 to 10000 ;                                                                                                                                                                                                                                         
    y  = ya [_n_] ;                                                                                                                                                                                                                                             
    y1 = y1a[_n_] ;                                                                                                                                                                                                                                             
    y2 = y2a[_n_] ;                                                                                                                                                                                                                                             
    y3 = y3a[_n_] ;                                                                                                                                                                                                                                             
    output ;                                                                                                                                                                                                                                                    
  end ;                                                                                                                                                                                                                                                         
run ;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;actually outperforms SQL by about 20 percent in my tests on the same laptop I'm typing this. Which kind of makes me ideate that SQL does kind of the same sort of thing behind-the-scenes.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Kind regards&lt;/P&gt;
&lt;P&gt;Paul D.&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 07 Nov 2021 03:21:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/778959#M248030</guid>
      <dc:creator>hashman</dc:creator>
      <dc:date>2021-11-07T03:21:39Z</dc:date>
    </item>
    <item>
      <title>Re: Cartesian Product using the HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/779040#M248073</link>
      <description>&lt;P&gt;Yep, arrays are fastest.&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data TEST_FETCH ;                                                                                                                                                                                                                                                 
  set X end=LASTOBS;   
  if 0 then set Y;
  retain DSID;
  if _N_=1 then do; DSID=open('Y'); call set (DSID); end;
  else RC=rewind(DSID);
  do _N_ = 1 to 10e3;                                                                                                                                                                                                                                         
    RC=fetch(DSID);
    output ;                                                                                                                                                                                                                                                    
  end ;   
  if LASTOBS then RC=close(DSID); 
run ;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;in 1m20s,&amp;nbsp;compared to 45s for the array logic on my old server.&lt;/P&gt;
&lt;P&gt;FETCH is still several times faster than POINT=.&lt;/P&gt;</description>
      <pubDate>Mon, 08 Nov 2021 01:13:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/779040#M248073</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2021-11-08T01:13:55Z</dc:date>
    </item>
    <item>
      <title>Re: Cartesian Product using the HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/779041#M248074</link>
      <description>&lt;P&gt;Lots of great techniques have been discussed here.&lt;BR /&gt;When Paul and I worked together we would regularly have what can best be described as &lt;EM&gt;yeah, but&lt;/EM&gt; conversations. So here is my &lt;EM&gt;yeah, but&lt;/EM&gt; point.&lt;BR /&gt;The issue from my perspective is pretty straightforward, performance issues depend on a lot of factors and assuming that the observed results for a given set of data (or combinations of data tables) applies across the board is questionable at best.&amp;nbsp;&lt;BR /&gt;Lots of factors play into this. When creating a cartesian product, the size of the data sets probably matters - both in terms of the number of rows as well as the number of columns (as well as the total length of the columns).&lt;BR /&gt;When I did performance evaluations, I tried my best to use data sets that looked like the data for the application at hand. All of the approaches presented here are worthy of evaluation. But they need to be evaluated in the context of the particular sets of data at issue.&lt;/P&gt;</description>
      <pubDate>Mon, 08 Nov 2021 02:00:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Cartesian-Product-using-the-HASH/m-p/779041#M248074</guid>
      <dc:creator>DonH</dc:creator>
      <dc:date>2021-11-08T02:00:15Z</dc:date>
    </item>
  </channel>
</rss>

