<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Big Data - Cartesian in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288375#M59455</link>
    <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;1051  data PHRASES;
1052    length PHRASE $2000;
1053    PHRASE=repeat('x',1000);
1054    do I=1 to 1e5; output; end ;
1055   run;

NOTE: The data set WORK.PHRASES has 100000 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.30 seconds
      user cpu time       0.06 seconds
      system cpu time     0.26 seconds
      memory              186.01k
      OS Memory           11888.00k
      Timestamp           31/07/2016 05:34:07 PM


1056
1057  data BRANDS;  length BRAND $40;
1058    BRAND=repeat('y',25);
1059    do J=1 to 199; output; end ;
1060    BRAND=repeat('x',25); output;
1061   run;

NOTE: The data set WORK.BRANDS has 200 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      user cpu time       0.00 seconds
      system cpu time     0.00 seconds
      memory              181.54k
      OS Memory           11888.00k
      Timestamp           31/07/2016 05:34:07 PM


1062
1063  proc sql;
1064    create table OUT_SQL_CONTAINS as
1065    select * from PHRASES, BRANDS
1066    where trim(PHRASES.PHRASE) contains trim(BRANDS.BRAND);
NOTE: The execution of this query involves performing one or more Cartesian product joins that
      can not be optimized.
NOTE: Table WORK.OUT_SQL_CONTAINS created, with 100000 rows and 4 columns.

1067  quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           28.02 seconds
      user cpu time       27.44 seconds
      system cpu time     0.48 seconds
      memory              539.80k
      OS Memory           11888.00k
      Timestamp           31/07/2016 05:34:35 PM


1068
1069  proc sql;
1070    create table OUT_SQL_INDEX as
1071    select * from PHRASES, BRANDS
1072    where index(trim(PHRASES.PHRASE),trim(BRANDS.BRAND));
NOTE: The execution of this query involves performing one or more Cartesian product joins that
      can not be optimized.
NOTE: Table WORK.OUT_SQL_INDEX created, with 100000 rows and 4 columns.

1073  quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           27.99 seconds
      user cpu time       27.33 seconds
      system cpu time     0.53 seconds
      memory              552.67k
      OS Memory           11888.00k
      Timestamp           31/07/2016 05:35:03 PM


1074
1075  proc sql;
1076    create table OUT_SQL_FIND1 as
1077    select * from PHRASES, BRANDS
1078    where find(trim(PHRASES.PHRASE),trim(BRANDS.BRAND));
NOTE: The execution of this query involves performing one or more Cartesian product joins that
      can not be optimized.
NOTE: Table WORK.OUT_SQL_FIND1 created, with 100000 rows and 4 columns.

1079  quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           37.12 seconds
      user cpu time       36.59 seconds
      system cpu time     0.42 seconds
      memory              552.67k
      OS Memory           11888.00k
      Timestamp           31/07/2016 05:35:40 PM


1080
1081  proc sql;
1082    create table OUT_SQL_FIND2 as
1083    select * from PHRASES, BRANDS
1084    where find(PHRASES.PHRASE,BRANDS.BRAND,'t');
NOTE: The execution of this query involves performing one or more Cartesian product joins that
      can not be optimized.
NOTE: Table WORK.OUT_SQL_FIND2 created, with 100000 rows and 4 columns.

1085  quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           34.66 seconds
      user cpu time       34.14 seconds
      system cpu time     0.60 seconds
      memory              528.54k
      OS Memory           11888.00k
      Timestamp           31/07/2016 05:36:15 PM


1086
1087  data OUT_DS_ARRAY;
1088    array BRANDS [200] $40 _temporary_;
1089    if _N_=1 then do K=1 to 200;
1090      set BRANDS;
1091      BRANDS[K]=BRAND;
1092    end;
1093    set PHRASES;
1094    do K=1 to 200;
1095      if index(trim(PHRASE), trim(BRANDS[K])) then do;
1096        BRAND=BRANDS[K];
1097        output;
1098      end;
1099    end;
1100    drop K;
1101  run;

NOTE: There were 200 observations read from the data set WORK.BRANDS.
NOTE: There were 100000 observations read from the data set WORK.PHRASES.
NOTE: The data set WORK.OUT_DS_ARRAY has 100000 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           24.82 seconds
      user cpu time       24.11 seconds
      system cpu time     0.70 seconds
      memory              369.45k
      OS Memory           11888.00k
      Timestamp           31/07/2016 05:36:40 PM


1102
1103  data OUT_DS_POINT;
1104    set PHRASES;
1105    do K=1 to 200;
1106      set BRANDS point=K;
1107      if index(trim(PHRASE), trim(BRAND)) then output;
1108    end;
1109    drop K;
1110  run;

NOTE: There were 100000 observations read from the data set WORK.PHRASES.
NOTE: The data set WORK.OUT_DS_POINT has 100000 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           31.10 seconds
      user cpu time       28.56 seconds
      system cpu time     2.46 seconds
      memory              326.67k
      OS Memory           11888.00k
      Timestamp           31/07/2016 05:37:11 PM


1111
1112  data OUT_DS_HASH;
1113    set PHRASES;
1114    if _N_=1 then do;
1115      dcl hash BRANDS(dataset:'BRANDS');
1116      BRANDS.definekey('BRAND');
1117      BRANDS.definedata('BRAND','J');
1118      BRANDS.definedone();
1119      declare hiter IT('BRANDS');
1120      if 0 then set BRANDS;
1121    end;
1122    RC=IT.first();
1123    do while (RC = 0);
1124      if index(trim(PHRASE), trim(BRAND)) then output;
1125      RC=IT.next();
1126    end;
1127    drop RC;
1128  run;

NOTE: There were 200 observations read from the data set WORK.BRANDS.
NOTE: There were 100000 observations read from the data set WORK.PHRASES.
NOTE: The data set WORK.OUT_DS_HASH has 100000 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           0.98 seconds
      user cpu time       0.56 seconds
      system cpu time     0.42 seconds
      memory              701.21k
      OS Memory           11888.00k
      Timestamp           31/07/2016 05:37:12 PM


&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sun, 31 Jul 2016 05:55:30 GMT</pubDate>
    <dc:creator>ChrisNZ</dc:creator>
    <dc:date>2016-07-31T05:55:30Z</dc:date>
    <item>
      <title>Big Data - Cartesian</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288303#M59418</link>
      <description>&lt;P&gt;Hi, I have a very simple SQL procedure (Cartesian) from two tables:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Base – 400K Rows, 2 columns&lt;/LI&gt;&lt;LI&gt;Dict – 25 mio Rows, &lt;SPAN&gt;2 columns&lt;/SPAN&gt;:&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;STRONG&gt;proc&lt;/STRONG&gt; &lt;STRONG&gt;sql&lt;/STRONG&gt;;&lt;/P&gt;&lt;P&gt;create table Final as&lt;/P&gt;&lt;P&gt;select a.*, b.* from Base a&lt;/P&gt;&lt;P&gt;join Dict b&lt;/P&gt;&lt;P&gt;on trim(a.Name_2) &lt;STRONG&gt;&lt;U&gt;contains&lt;/U&gt;&lt;/STRONG&gt; trim(b.Name_1)&lt;/P&gt;&lt;P&gt;;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;quit&lt;/STRONG&gt;;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;it works, but very slowly. My server needs &lt;U&gt;&lt;STRONG&gt;one month&lt;/STRONG&gt;&lt;/U&gt; to process this. Do you have an idea how to speed it?&lt;/P&gt;</description>
      <pubDate>Sat, 30 Jul 2016 09:01:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288303#M59418</guid>
      <dc:creator>pietro342</dc:creator>
      <dc:date>2016-07-30T09:01:31Z</dc:date>
    </item>
    <item>
      <title>Re: Big Data - Cartesian</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288304#M59419</link>
      <description>Lots of RAM, and fast disks for saswork location. &lt;BR /&gt;If you care to rewrite it I would assume that a data step with a hash table technique is much faster. &lt;BR /&gt;A general thought  (that doesn't necessarily help this specific query) is to always try to avoid function call in the join criteria. Make sure that your source data is already trimmed.</description>
      <pubDate>Sat, 30 Jul 2016 09:11:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288304#M59419</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2016-07-30T09:11:15Z</dc:date>
    </item>
    <item>
      <title>Re: Big Data - Cartesian</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288313#M59423</link>
      <description>You can also define your library as a binary compressed Spde library. This will reduce IOs. And may add threads. &lt;BR /&gt;The contains operator is certainly slowing things down. Would the find function and the t parameter be faster? &lt;BR /&gt;What about loading your tables in RAM using sasfile? &lt;BR /&gt;</description>
      <pubDate>Sat, 30 Jul 2016 10:23:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288313#M59423</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2016-07-30T10:23:16Z</dc:date>
    </item>
    <item>
      <title>Re: Big Data - Cartesian</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288314#M59424</link>
      <description>Do you have the fullstimer log output?</description>
      <pubDate>Sat, 30 Jul 2016 10:24:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288314#M59424</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2016-07-30T10:24:26Z</dc:date>
    </item>
    <item>
      <title>Re: Big Data - Cartesian</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288315#M59425</link>
      <description>You can influence proc sql to use a hash method by adding option magic=103. You can also use proc sql option buffersize= to read data in bulk.</description>
      <pubDate>Sat, 30 Jul 2016 10:31:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288315#M59425</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2016-07-30T10:31:56Z</dc:date>
    </item>
    <item>
      <title>Re: Big Data - Cartesian</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288316#M59426</link>
      <description>So many possibilities...</description>
      <pubDate>Sat, 30 Jul 2016 10:32:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288316#M59426</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2016-07-30T10:32:23Z</dc:date>
    </item>
    <item>
      <title>Re: Big Data - Cartesian</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288322#M59429</link>
      <description>&lt;P&gt;&amp;nbsp;LinusH,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In my opinion, HASH TABLES will not work with contains. Columns name_1 and name_2 are string and it searches the contents.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;ChrisNZ,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I checked FIND and INDEX and they were slower. What is magic=103 and &amp;nbsp;option buffersize=? Have you got sample query?&lt;/P&gt;&lt;P&gt;I think that this is a text mining problem and I thought about using SAS Sentiment analytyst, what do you think?&lt;/P&gt;</description>
      <pubDate>Sat, 30 Jul 2016 11:47:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288322#M59429</guid>
      <dc:creator>pietro342</dc:creator>
      <dc:date>2016-07-30T11:47:55Z</dc:date>
    </item>
    <item>
      <title>Re: Big Data - Cartesian</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288329#M59434</link>
      <description>&lt;P&gt;As defined currently there's not much of any type of analysis.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Your doing a lookup, but it depends on the output you want.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;ASSUMING your dictionary is single words and your other dataset are phrases, maybe a custom lookup and splitting the data into words rather than leaving it as phrases would be faster. You do have to be careful though, is Apples the same as Apple, and what about Applebee, the restaurant. It's not an exact science.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 30 Jul 2016 13:50:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288329#M59434</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2016-07-30T13:50:38Z</dc:date>
    </item>
    <item>
      <title>Re: Big Data - Cartesian</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288331#M59435</link>
      <description>Maybe, I wasn't clear enogh &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;BR /&gt;&lt;BR /&gt;I' m looking for company names in texts, so dictionary is not single words. For example:&lt;BR /&gt;&lt;BR /&gt;"A global leader in the beverage industry, the &amp;lt;Coca-Cola company&amp;gt; offers hundreds of brands, including soft drinks, fruit ..."</description>
      <pubDate>Sat, 30 Jul 2016 14:50:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288331#M59435</guid>
      <dc:creator>pietro342</dc:creator>
      <dc:date>2016-07-30T14:50:19Z</dc:date>
    </item>
    <item>
      <title>Re: Big Data - Cartesian</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288334#M59436</link>
      <description>&lt;P&gt;Ok, but why? Sentiment analysis looks for positive/negative connotations to text AFAIK. Is that related to your problem?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;SAS Text Analytics tool, an add on for SAS EM may be beneficial but I doubt it.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Can each statement meet multiple company names or do you expect a single one per observation?&lt;/P&gt;</description>
      <pubDate>Sat, 30 Jul 2016 15:00:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288334#M59436</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2016-07-30T15:00:33Z</dc:date>
    </item>
    <item>
      <title>Re: Big Data - Cartesian</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288339#M59440</link>
      <description>&lt;P&gt;For every word in every record in a, SQL will search through b. Ungood.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'd try this:&lt;/P&gt;
&lt;P&gt;Create a unique ID in dataset base (use _N_ in a data step, for example)&lt;/P&gt;
&lt;P&gt;Create a new dataset from all the words in base.name_2 and the ID (will be a multiple of the 400k)&lt;/P&gt;
&lt;P&gt;Sort intermediate dataset and dict by the trimmed words&lt;/P&gt;
&lt;P&gt;Join and only keep matches (in a data step)&lt;/P&gt;
&lt;P&gt;Sort result by ID with nodupkey&lt;/P&gt;
&lt;P&gt;Join with base and only keep matches&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Will be more code to write, but perform faster than your SQL by orders of magnitude.&lt;/P&gt;</description>
      <pubDate>Sat, 30 Jul 2016 18:05:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288339#M59440</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2016-07-30T18:05:52Z</dc:date>
    </item>
    <item>
      <title>Re: Big Data - Cartesian</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288374#M59454</link>
      <description>&lt;P&gt;Alright I ran a few tests on my PC.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Firstly, the process is CPU-bound for me, so no point trying to optimise I/Os through better use of RAM or buffers.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Secondly, the &lt;EM&gt;contains&lt;/EM&gt; operator is as fast as function index(trim()), while funtion find() is slower. So SAS added a new function that is slower than the old one? Why?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thirdly I tried a few data step matching logics to see how they compared to the SQL join: Scanning an array, scanning the table directly and scanning a hash table. The array scan is 20% faster than SQL, the &lt;EM&gt;point=&lt;/EM&gt; direct access 20% slower (but on par with SQL when data is loaded in memory with sasfile), and &lt;STRONG&gt;the hash match is 95% faster than SQL.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;We have a clear winner here.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;More benchmarks and performance tips in&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://www.amazon.com/High-Performance-SAS-Coding-Christian-Graffeuille/dp/1512397490" target="_blank"&gt;https://www.amazon.com/High-Performance-SAS-Coding-Christian-Graffeuille/dp/1512397490&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I may reuse this example in a future edition!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Log below.&lt;/P&gt;</description>
      <pubDate>Sun, 31 Jul 2016 06:07:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288374#M59454</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2016-07-31T06:07:07Z</dc:date>
    </item>
    <item>
      <title>Re: Big Data - Cartesian</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288375#M59455</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;1051  data PHRASES;
1052    length PHRASE $2000;
1053    PHRASE=repeat('x',1000);
1054    do I=1 to 1e5; output; end ;
1055   run;

NOTE: The data set WORK.PHRASES has 100000 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.30 seconds
      user cpu time       0.06 seconds
      system cpu time     0.26 seconds
      memory              186.01k
      OS Memory           11888.00k
      Timestamp           31/07/2016 05:34:07 PM


1056
1057  data BRANDS;  length BRAND $40;
1058    BRAND=repeat('y',25);
1059    do J=1 to 199; output; end ;
1060    BRAND=repeat('x',25); output;
1061   run;

NOTE: The data set WORK.BRANDS has 200 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      user cpu time       0.00 seconds
      system cpu time     0.00 seconds
      memory              181.54k
      OS Memory           11888.00k
      Timestamp           31/07/2016 05:34:07 PM


1062
1063  proc sql;
1064    create table OUT_SQL_CONTAINS as
1065    select * from PHRASES, BRANDS
1066    where trim(PHRASES.PHRASE) contains trim(BRANDS.BRAND);
NOTE: The execution of this query involves performing one or more Cartesian product joins that
      can not be optimized.
NOTE: Table WORK.OUT_SQL_CONTAINS created, with 100000 rows and 4 columns.

1067  quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           28.02 seconds
      user cpu time       27.44 seconds
      system cpu time     0.48 seconds
      memory              539.80k
      OS Memory           11888.00k
      Timestamp           31/07/2016 05:34:35 PM


1068
1069  proc sql;
1070    create table OUT_SQL_INDEX as
1071    select * from PHRASES, BRANDS
1072    where index(trim(PHRASES.PHRASE),trim(BRANDS.BRAND));
NOTE: The execution of this query involves performing one or more Cartesian product joins that
      can not be optimized.
NOTE: Table WORK.OUT_SQL_INDEX created, with 100000 rows and 4 columns.

1073  quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           27.99 seconds
      user cpu time       27.33 seconds
      system cpu time     0.53 seconds
      memory              552.67k
      OS Memory           11888.00k
      Timestamp           31/07/2016 05:35:03 PM


1074
1075  proc sql;
1076    create table OUT_SQL_FIND1 as
1077    select * from PHRASES, BRANDS
1078    where find(trim(PHRASES.PHRASE),trim(BRANDS.BRAND));
NOTE: The execution of this query involves performing one or more Cartesian product joins that
      can not be optimized.
NOTE: Table WORK.OUT_SQL_FIND1 created, with 100000 rows and 4 columns.

1079  quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           37.12 seconds
      user cpu time       36.59 seconds
      system cpu time     0.42 seconds
      memory              552.67k
      OS Memory           11888.00k
      Timestamp           31/07/2016 05:35:40 PM


1080
1081  proc sql;
1082    create table OUT_SQL_FIND2 as
1083    select * from PHRASES, BRANDS
1084    where find(PHRASES.PHRASE,BRANDS.BRAND,'t');
NOTE: The execution of this query involves performing one or more Cartesian product joins that
      can not be optimized.
NOTE: Table WORK.OUT_SQL_FIND2 created, with 100000 rows and 4 columns.

1085  quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           34.66 seconds
      user cpu time       34.14 seconds
      system cpu time     0.60 seconds
      memory              528.54k
      OS Memory           11888.00k
      Timestamp           31/07/2016 05:36:15 PM


1086
1087  data OUT_DS_ARRAY;
1088    array BRANDS [200] $40 _temporary_;
1089    if _N_=1 then do K=1 to 200;
1090      set BRANDS;
1091      BRANDS[K]=BRAND;
1092    end;
1093    set PHRASES;
1094    do K=1 to 200;
1095      if index(trim(PHRASE), trim(BRANDS[K])) then do;
1096        BRAND=BRANDS[K];
1097        output;
1098      end;
1099    end;
1100    drop K;
1101  run;

NOTE: There were 200 observations read from the data set WORK.BRANDS.
NOTE: There were 100000 observations read from the data set WORK.PHRASES.
NOTE: The data set WORK.OUT_DS_ARRAY has 100000 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           24.82 seconds
      user cpu time       24.11 seconds
      system cpu time     0.70 seconds
      memory              369.45k
      OS Memory           11888.00k
      Timestamp           31/07/2016 05:36:40 PM


1102
1103  data OUT_DS_POINT;
1104    set PHRASES;
1105    do K=1 to 200;
1106      set BRANDS point=K;
1107      if index(trim(PHRASE), trim(BRAND)) then output;
1108    end;
1109    drop K;
1110  run;

NOTE: There were 100000 observations read from the data set WORK.PHRASES.
NOTE: The data set WORK.OUT_DS_POINT has 100000 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           31.10 seconds
      user cpu time       28.56 seconds
      system cpu time     2.46 seconds
      memory              326.67k
      OS Memory           11888.00k
      Timestamp           31/07/2016 05:37:11 PM


1111
1112  data OUT_DS_HASH;
1113    set PHRASES;
1114    if _N_=1 then do;
1115      dcl hash BRANDS(dataset:'BRANDS');
1116      BRANDS.definekey('BRAND');
1117      BRANDS.definedata('BRAND','J');
1118      BRANDS.definedone();
1119      declare hiter IT('BRANDS');
1120      if 0 then set BRANDS;
1121    end;
1122    RC=IT.first();
1123    do while (RC = 0);
1124      if index(trim(PHRASE), trim(BRAND)) then output;
1125      RC=IT.next();
1126    end;
1127    drop RC;
1128  run;

NOTE: There were 200 observations read from the data set WORK.BRANDS.
NOTE: There were 100000 observations read from the data set WORK.PHRASES.
NOTE: The data set WORK.OUT_DS_HASH has 100000 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           0.98 seconds
      user cpu time       0.56 seconds
      system cpu time     0.42 seconds
      memory              701.21k
      OS Memory           11888.00k
      Timestamp           31/07/2016 05:37:12 PM


&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 31 Jul 2016 05:55:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288375#M59455</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2016-07-31T05:55:30Z</dc:date>
    </item>
    <item>
      <title>Re: Big Data - Cartesian</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288380#M59460</link>
      <description>So my gut feeling wasn't out of bounds...&lt;span class="lia-unicode-emoji" title=":smiling_face_with_sunglasses:"&gt;😎&lt;/span&gt;</description>
      <pubDate>Sun, 31 Jul 2016 08:32:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288380#M59460</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2016-07-31T08:32:51Z</dc:date>
    </item>
    <item>
      <title>Re: Big Data - Cartesian</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288486#M59503</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;Chris, I appreciate your help. You are amazing!&lt;/P&gt;&lt;P&gt;I tested your query. In my test Contain is still faster than INDEX &amp;amp; HASH TABLES. Maybe I’m doing something wrong?&lt;/P&gt;&lt;P&gt;On the another hand, I had problem, because I had to divide table into 2 smaller. &amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Contain: 14 min&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;OUT_DS_HASH: 21min + 22 min = 43 min&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Log below:&lt;/P&gt;&lt;P&gt;499&amp;nbsp; /****************************************/&lt;/P&gt;&lt;P&gt;500&amp;nbsp; /* Standard Query */&lt;/P&gt;&lt;P&gt;501&amp;nbsp; /****************************************/&lt;/P&gt;&lt;P&gt;502&lt;/P&gt;&lt;P&gt;503&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; proc sql;&lt;/P&gt;&lt;P&gt;504&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; create table F_Standard as&lt;/P&gt;&lt;P&gt;505&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; select a.*, b.* from Phrases a&lt;/P&gt;&lt;P&gt;506&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; join Brands b&lt;/P&gt;&lt;P&gt;507&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; on trim(a.PHRASE) contains trim(b.BRAND)&lt;/P&gt;&lt;P&gt;508&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ;&lt;/P&gt;&lt;P&gt;NOTE: The execution of this query involves performing one or more Cartesian product joins that&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; can not be optimized.&lt;/P&gt;&lt;P&gt;NOTE: Compressing data set WORK.F_STANDARD decreased size by 62.50 percent.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Compressed is 6 pages; un-compressed would require 16 pages.&lt;/P&gt;&lt;P&gt;NOTE: Table WORK.F_STANDARD created, with 775 rows and 3 columns.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;509&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; quit;&lt;/P&gt;&lt;P&gt;NOTE: PROCEDURE SQL used (Total process time):&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;FONT color="#FF0000"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; real time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 13:58.42&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;FONT color="#FF0000"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; cpu time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 13:58.16&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;510&lt;/P&gt;&lt;P&gt;511&amp;nbsp; /****************************************/&lt;/P&gt;&lt;P&gt;512&amp;nbsp; /* New Query */&lt;/P&gt;&lt;P&gt;513&amp;nbsp; /****************************************/&lt;/P&gt;&lt;P&gt;514&lt;/P&gt;&lt;P&gt;593&amp;nbsp;&amp;nbsp;&amp;nbsp; data OUT_DS_HASH;&lt;/P&gt;&lt;P&gt;594&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; set PHRASES;&lt;/P&gt;&lt;P&gt;595&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if _N_=1 then do;&lt;/P&gt;&lt;P&gt;596&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; dcl hash BRANDS(dataset:'BRANDS');&lt;/P&gt;&lt;P&gt;597&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; BRANDS.definekey('BRAND');&lt;/P&gt;&lt;P&gt;598&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; BRANDS.definedata('BRAND','J');&lt;/P&gt;&lt;P&gt;599&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; BRANDS.definedone();&lt;/P&gt;&lt;P&gt;600&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; declare hiter IT('BRANDS');&lt;/P&gt;&lt;P&gt;601&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if 0 then set BRANDS;&lt;/P&gt;&lt;P&gt;602&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;603&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; RC=IT.first();&lt;/P&gt;&lt;P&gt;604&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; do while (RC = 0);&lt;/P&gt;&lt;P&gt;605&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if index(trim(PHRASE), trim(BRAND)) then output;&lt;/P&gt;&lt;P&gt;606&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; RC=IT.next();&lt;/P&gt;&lt;P&gt;607&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;608&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; drop RC;&lt;/P&gt;&lt;P&gt;609&amp;nbsp;&amp;nbsp;&amp;nbsp; run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;ERROR: Hash object added 17301488 items when memory failure occurred.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;FATAL: Insufficient memory to execute DATA step program. Aborted during the EXECUTION phase.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;ERROR: The SAS System stopped processing this step because of insufficient memory.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;NOTE: There were 1 observations read from the data set WORK.PHRASES.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;WARNING: The data set WORK.OUT_DS_HASH may be incomplete.&amp;nbsp; When this step was stopped there&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; were 0 observations and 3 variables.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;WARNING: Data set WORK.OUT_DS_HASH was not replaced because this step was stopped.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;NOTE: DATA statement used (Total process time):&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; real time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 16.72 seconds&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; cpu time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 16.41 seconds&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;532&lt;/P&gt;&lt;P&gt;533&amp;nbsp; /****************************************/&lt;/P&gt;&lt;P&gt;534&amp;nbsp; /* New Query part 2 */&lt;/P&gt;&lt;P&gt;535&amp;nbsp; /****************************************/&lt;/P&gt;&lt;P&gt;536&lt;/P&gt;&lt;P&gt;537&amp;nbsp; data Brands_00;&lt;/P&gt;&lt;P&gt;538&amp;nbsp; set Brands;&lt;/P&gt;&lt;P&gt;539&amp;nbsp; RANDOM_INT = INT(RANUNI(0)*1000000);&lt;/P&gt;&lt;P&gt;540&amp;nbsp; run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;NOTE: There were 21597444 observations read from the data set WORK.BRANDS.&lt;/P&gt;&lt;P&gt;NOTE: The data set WORK.BRANDS_00 has 21597444 observations and 3 variables.&lt;/P&gt;&lt;P&gt;NOTE: Compressing data set WORK.BRANDS_00 decreased size by 31.68 percent.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Compressed is 175669 pages; un-compressed would require 257113 pages.&lt;/P&gt;&lt;P&gt;NOTE: DATA statement used (Total process time):&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; real time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 16.05 seconds&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;cpu time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 16.02 seconds&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;541&lt;/P&gt;&lt;P&gt;542&amp;nbsp; data Brands_01;&lt;/P&gt;&lt;P&gt;543&amp;nbsp; set Brands_00;&lt;/P&gt;&lt;P&gt;544&amp;nbsp; where RANDOM_INT &amp;gt; 500000;&lt;/P&gt;&lt;P&gt;545&amp;nbsp; drop RANDOM_INT;&lt;/P&gt;&lt;P&gt;546&amp;nbsp; run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;NOTE: There were 10800201 observations read from the data set WORK.BRANDS_00.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; WHERE RANDOM_INT&amp;gt;500000;&lt;/P&gt;&lt;P&gt;NOTE: The data set WORK.BRANDS_01 has 10800201 observations and 2 variables.&lt;/P&gt;&lt;P&gt;NOTE: Compressing data set WORK.BRANDS_01 decreased size by 30.13 percent.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Compressed is 81138 pages; un-compressed would require 116132 pages.&lt;/P&gt;&lt;P&gt;NOTE: DATA statement used (Total process time):&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; real time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 8.95 seconds&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; cpu time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 8.79 seconds&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;547&lt;/P&gt;&lt;P&gt;548&amp;nbsp; data Brands_02;&lt;/P&gt;&lt;P&gt;549&amp;nbsp; set Brands_00;&lt;/P&gt;&lt;P&gt;550&amp;nbsp; where RANDOM_INT &amp;lt;= 500000;&lt;/P&gt;&lt;P&gt;551&amp;nbsp; drop RANDOM_INT;&lt;/P&gt;&lt;P&gt;552&amp;nbsp; run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;NOTE: There were 10797243 observations read from the data set WORK.BRANDS_00.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; WHERE RANDOM_INT&amp;lt;=500000;&lt;/P&gt;&lt;P&gt;NOTE: The data set WORK.BRANDS_02 has 10797243 observations and 2 variables.&lt;/P&gt;&lt;P&gt;NOTE: Compressing data set WORK.BRANDS_02 decreased size by 30.13 percent.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Compressed is 81117 pages; un-compressed would require 116100 pages.&lt;/P&gt;&lt;P&gt;NOTE: DATA statement used (Total process time):&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; real time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 8.57 seconds&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; cpu time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 8.54 seconds&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;553&lt;/P&gt;&lt;P&gt;554&amp;nbsp;&amp;nbsp;&amp;nbsp; data OUT_DS_HASH_01;&lt;/P&gt;&lt;P&gt;555&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; set PHRASES;&lt;/P&gt;&lt;P&gt;556&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if _N_=1 then do;&lt;/P&gt;&lt;P&gt;557&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; dcl hash BRANDS_01(dataset:'BRANDS_01');&lt;/P&gt;&lt;P&gt;558&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; BRANDS_01.definekey('BRAND');&lt;/P&gt;&lt;P&gt;559&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; BRANDS_01.definedata('BRAND','J');&lt;/P&gt;&lt;P&gt;560&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; BRANDS_01.definedone();&lt;/P&gt;&lt;P&gt;561&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; declare hiter IT('BRANDS_01');&lt;/P&gt;&lt;P&gt;562&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if 0 then set BRANDS_01;&lt;/P&gt;&lt;P&gt;563 &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;end;&lt;/P&gt;&lt;P&gt;564&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; RC=IT.first();&lt;/P&gt;&lt;P&gt;565&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; do while (RC = 0);&lt;/P&gt;&lt;P&gt;566&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if index(trim(PHRASE), trim(BRAND)) then output;&lt;/P&gt;&lt;P&gt;567&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; RC=IT.next();&lt;/P&gt;&lt;P&gt;568&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;569&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; drop RC;&lt;/P&gt;&lt;P&gt;570&amp;nbsp;&amp;nbsp;&amp;nbsp; run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;NOTE: There were 10800201 observations read from the data set WORK.BRANDS_01.&lt;/P&gt;&lt;P&gt;NOTE: There were 330 observations read from the data set WORK.PHRASES.&lt;/P&gt;&lt;P&gt;NOTE: The data set WORK.OUT_DS_HASH_01 has 399 observations and 3 variables.&lt;/P&gt;&lt;P&gt;NOTE: Compressing data set WORK.OUT_DS_HASH_01 decreased size by 50.00 percent.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Compressed is 4 pages; un-compressed would require 8 pages.&lt;/P&gt;&lt;P&gt;NOTE: DATA statement used (Total process time):&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;FONT color="#FF0000"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; real time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 21:21.56&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;FONT color="#FF0000"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; cpu time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 21:19.62&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;571&lt;/P&gt;&lt;P&gt;572&amp;nbsp;&amp;nbsp;&amp;nbsp; data OUT_DS_HASH_02;&lt;/P&gt;&lt;P&gt;573&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; set PHRASES;&lt;/P&gt;&lt;P&gt;574&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if _N_=1 then do;&lt;/P&gt;&lt;P&gt;575&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; dcl hash BRANDS_02(dataset:'BRANDS_02');&lt;/P&gt;&lt;P&gt;576&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; BRANDS_02.definekey('BRAND');&lt;/P&gt;&lt;P&gt;577&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; BRANDS_02.definedata('BRAND','J');&lt;/P&gt;&lt;P&gt;578&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; BRANDS_02.definedone();&lt;/P&gt;&lt;P&gt;579&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; declare hiter IT('BRANDS_02');&lt;/P&gt;&lt;P&gt;580&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if 0 then set BRANDS_02;&lt;/P&gt;&lt;P&gt;581&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;582&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; RC=IT.first();&lt;/P&gt;&lt;P&gt;583&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; do while (RC = 0);&lt;/P&gt;&lt;P&gt;584&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if index(trim(PHRASE), trim(BRAND)) then output;&lt;/P&gt;&lt;P&gt;585&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; RC=IT.next();&lt;/P&gt;&lt;P&gt;586&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;587&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; drop RC;&lt;/P&gt;&lt;P&gt;588&amp;nbsp;&amp;nbsp;&amp;nbsp; run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;NOTE: There were 10797243 observations read from the data set WORK.BRANDS_02.&lt;/P&gt;&lt;P&gt;NOTE: There were 330 observations read from the data set WORK.PHRASES.&lt;/P&gt;&lt;P&gt;NOTE: The data set WORK.OUT_DS_HASH_02 has 376 observations and 3 variables.&lt;/P&gt;&lt;P&gt;NOTE: Compressing data set WORK.OUT_DS_HASH_02 decreased size by 50.00 percent.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Compressed is 4 pages; un-compressed would require 8 pages.&lt;/P&gt;&lt;P&gt;NOTE: DATA statement used (Total process time):&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; real time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 22:08.78&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#FF0000"&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; cpu time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 22:02.82&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;589&lt;/P&gt;&lt;P&gt;590&amp;nbsp; Data OUT_DS_HASH;&lt;/P&gt;&lt;P&gt;591&amp;nbsp; set OUT_DS_HASH_02 OUT_DS_HASH_01;&lt;/P&gt;&lt;P&gt;592&amp;nbsp; run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;NOTE: There were 376 observations read from the data set WORK.OUT_DS_HASH_02.&lt;/P&gt;&lt;P&gt;NOTE: There were 399 observations read from the data set WORK.OUT_DS_HASH_01.&lt;/P&gt;&lt;P&gt;NOTE: The data set WORK.OUT_DS_HASH has 775 observations and 3 variables.&lt;/P&gt;&lt;P&gt;NOTE: Compressing data set WORK.OUT_DS_HASH decreased size by 62.50 percent.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Compressed is 6 pages; un-compressed would require 16 pages.&lt;/P&gt;&lt;P&gt;NOTE: DATA statement used (Total process time):&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; real time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.01 seconds&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; cpu time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.01 seconds&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 01 Aug 2016 09:02:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288486#M59503</guid>
      <dc:creator>pietro342</dc:creator>
      <dc:date>2016-08-01T09:02:09Z</dc:date>
    </item>
    <item>
      <title>Re: Big Data - Cartesian</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288518#M59513</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You need to provide more information about your data, or a good replication of similar fake data and cardinality if you want us to replicate your problem.&amp;nbsp;Also please use option fullstimer.&lt;/P&gt;
&lt;P&gt;.&lt;/P&gt;
&lt;P&gt;Your new number of observations is radically different from what you first mentioned, but&amp;nbsp;I get even better&amp;nbsp;results for the hash table, which is now 10,000 times faster than SQL.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;
36   options fullstimer;
37
38   data PHRASES;
39     length PHRASE $2000;
40     PHRASE=repeat('x',1000);
41     do I=1 to 300; output; end ;
42    run;

NOTE: The data set WORK.PHRASES has 300 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      user cpu time       0.00 seconds
      system cpu time     0.01 seconds
      memory              186.10k
      OS Memory           6768.00k
      Timestamp            1/08/2016 10:10:44 PM


43
44   data BRANDS;  &lt;BR /&gt;       length BRAND $40;
45     BRAND=repeat('y',25);
46     do J=1 to 1e6; output; end ;
47     BRAND=repeat('x',25); output;
48   run;

NOTE: The data set WORK.BRANDS has 1000001 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.17 seconds
      user cpu time       0.09 seconds
      system cpu time     0.07 seconds
      memory              181.64k
      OS Memory           6768.00k
      Timestamp            1/08/2016 10:10:44 PM


49
50   proc sql;
51     create table OUT_SQL_CONTAINS as
52     select * from PHRASES, BRANDS
53     where trim(PHRASES.PHRASE) contains trim(BRANDS.BRAND);
NOTE: The execution of this query involves performing one or more Cartesian product joins that
      can not be optimized.
NOTE: Table WORK.OUT_SQL_CONTAINS created, with 300 rows and 4 columns.

54   quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           6:49.74
      user cpu time       6:48.03
      system cpu time     0.21 seconds
      memory              1041.61k
      OS Memory           7280.00k
      Timestamp            1/08/2016 10:17:34 PM


55
56   data OUT_DS_HASH;
57     set PHRASES;
58     if _N_=1 then do;
59       dcl hash BRANDS(dataset:'BRANDS');
60       BRANDS.definekey('BRAND');
61       BRANDS.definedata('BRAND','J');
62       BRANDS.definedone();
63       declare hiter IT('BRANDS');
64       if 0 then set BRANDS;
65     end;
66     RC=IT.first();
67     do while (RC = 0);
68       if index(trim(PHRASE), trim(BRAND)) then output;
69       RC=IT.next();
70     end;
71     drop RC;
72   run;

NOTE: There were 1000001 observations read from the data set WORK.BRANDS.
NOTE: There were 300 observations read from the data set WORK.PHRASES.
NOTE: The data set WORK.OUT_DS_HASH has 300 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           0.32 seconds
      user cpu time       0.25 seconds
      system cpu time     0.06 seconds
      memory              741.91k
      OS Memory           6768.00k
      Timestamp            1/08/2016 10:17:34 PM
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;How SAS can iterate through a hash table 300 million times in 0.3s is beyond me. Is this right?&lt;/P&gt;
&lt;P&gt;That's one iteration every&amp;nbsp;other&amp;nbsp;CPU clock cycle! This sounds implausible.&lt;/P&gt;
&lt;P&gt;Also very odd is that the data step only used 7 MB of RAM. The hash table alone is 40+ MB.&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt; Is this right? What is going on?&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 01 Aug 2016 10:42:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288518#M59513</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2016-08-01T10:42:33Z</dc:date>
    </item>
    <item>
      <title>Re: Big Data - Cartesian</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288520#M59515</link>
      <description>&lt;P&gt;You defined brands as 1 million repeats of the same value and one repeat of another value. Since you also defined BRAND as key in the hash, you end up with only 2 items in there.&lt;/P&gt;
&lt;P&gt;So you have only 600 iterations in the loop.&lt;/P&gt;</description>
      <pubDate>Mon, 01 Aug 2016 10:58:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288520#M59515</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2016-08-01T10:58:25Z</dc:date>
    </item>
    <item>
      <title>Re: Big Data - Cartesian</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288526#M59516</link>
      <description>&lt;P&gt;Haha! Good man! On to fixing this! I shouldnt try to think so late!!&lt;/P&gt;</description>
      <pubDate>Mon, 01 Aug 2016 11:10:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288526#M59516</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2016-08-01T11:10:23Z</dc:date>
    </item>
    <item>
      <title>Re: Big Data - Cartesian</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288527#M59517</link>
      <description>&lt;P&gt;The hash table doesn't shine so,much now that I test properly.&lt;/P&gt;
&lt;P&gt;Back to square one....&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;227  options fullstimer;
228
229  data PHRASES;
230    length PHRASE $2000;
231    PHRASE=repeat('x',1000);
232    do I=1 to 300; output; end ;
233   run;

NOTE: The data set WORK.PHRASES has 300 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      user cpu time       0.00 seconds
      system cpu time     0.01 seconds
      memory              186.10k
      OS Memory           10864.00k
      Timestamp            1/08/2016 11:16:49 PM


234
235  data BRANDS;
236    length BRAND $40;
237    BRAND=repeat('x',25); output;
238    do J=1 to 1e5; BRAND=put(J,20.);output; end ;
239  run;

NOTE: The data set WORK.BRANDS has 100001 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      user cpu time       0.01 seconds
      system cpu time     0.01 seconds
      memory              184.37k
      OS Memory           10864.00k
      Timestamp            1/08/2016 11:16:49 PM


240
241  proc sql;
242    create table OUT_SQL_CONTAINS as
243    select * from PHRASES, BRANDS
244    where trim(PHRASES.PHRASE) contains trim(BRANDS.BRAND);
NOTE: The execution of this query involves performing one or more Cartesian product joins that
      can not be optimized.
NOTE: Table WORK.OUT_SQL_CONTAINS created, with 300 rows and 4 columns.

245  quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           40.69 seconds
      user cpu time       40.67 seconds
      system cpu time     0.01 seconds
      memory              1042.18k
      OS Memory           10864.00k
      Timestamp            1/08/2016 11:17:29 PM


246
247  data OUT_DS_HASH;
248    set PHRASES;
249    if _N_=1 then do;
250      dcl hash BRANDS(dataset:'BRANDS');
251      BRANDS.definekey('BRAND');
252      BRANDS.definedata('BRAND','J');
253      BRANDS.definedone();
254      declare hiter IT('BRANDS');
255      if 0 then set BRANDS;
256    end;
257    RC=IT.first();
258    do while (RC = 0);
259      if index(trim(PHRASE), trim(BRAND)) then output;
260      RC=IT.next();
261    end;
262    drop RC;
263  run;

NOTE: There were 100001 observations read from the data set WORK.BRANDS.
NOTE: There were 300 observations read from the data set WORK.PHRASES.
NOTE: The data set WORK.OUT_DS_HASH has 300 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           43.08 seconds
      user cpu time       43.02 seconds
      system cpu time     0.04 seconds
      memory              13795.31k
      OS Memory           23820.00k
      Timestamp            1/08/2016 11:18:12 PM


&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 01 Aug 2016 11:20:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288527#M59517</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2016-08-01T11:20:26Z</dc:date>
    </item>
    <item>
      <title>Re: Big Data - Cartesian</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288766#M59590</link>
      <description>&lt;P&gt;Summary of the current status:&lt;/P&gt;
&lt;P&gt;- The process is CPU-bound&lt;/P&gt;
&lt;P&gt;- Hash table can be made slightly faster than SQL by having&amp;nbsp;a one-byte key (see next post)&lt;/P&gt;
&lt;P&gt;- Array match is still the fastest.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Recommendations until some has a better idea, and provided the sample data tested is close enough to the actual data:&lt;/P&gt;
&lt;P&gt;- Use arrays rather than SQL&lt;/P&gt;
&lt;P&gt;- Split the job into small chunks that can run in parallel so you can use more CPUs&lt;/P&gt;
&lt;P&gt;- If your brands have a length that vary from say 20 to 40, you could make one job for each length and remove the need for trimming the brand by making the variable length for each job the exact needed length, like this:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data CLASS4 (keep=NAME_4 rename=NAME_4=NAME where=(NAME ne '')) 
     CLASS5 (keep=NAME_5 rename=NAME_5=NAME where=(NAME ne ''))  
     CLASS6 (keep=NAME_6 rename=NAME_6=NAME where=(NAME ne ''))   
     CLASS7 (keep=NAME_7 rename=NAME_7=NAME where=(NAME ne ''))  ;
  length NAME_4 $4   NAME_5 $5   NAME_6 $6   NAME_7 $7;
  array NAME_[4:7] $8 NAME_4 - NAME_7;
  set SASHELP.CLASS; 
  NAME_[length(NAME)]=NAME;         
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 01 Aug 2016 23:37:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Big-Data-Cartesian/m-p/288766#M59590</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2016-08-01T23:37:44Z</dc:date>
    </item>
  </channel>
</rss>

