<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic improve proc sql join efficiency in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/improve-proc-sql-join-efficiency/m-p/673977#M202862</link>
    <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;Previous, when using SAS,&amp;nbsp; we can have pass through to run the query in Oracle, this process is quite smooth.&lt;/P&gt;&lt;P&gt;However, due to a special nature of safe environment, I have to run all the codes in SAS only.&amp;nbsp; &amp;nbsp;It runs incredibly slow now, even for a small set for data b.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here is the code, and I will put some labels.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I did some searches, but frankly not sure which will be the best approach like index( in this case will &amp;gt; than 34% unique labels) /hash?).&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;I tried:&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; 1) data step to extract all related data, then join; take too long and too much spaces;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; 2) use subquery, not by join.&amp;nbsp; &amp;nbsp;use filter as ---- label in (select distinct label in&amp;nbsp;att.list_low5_&amp;amp;c_year), run quite slow as well.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any recommendation from advanced users?&amp;nbsp; thank you very much.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;proc sql;&lt;BR /&gt;create table att.total_list_&amp;amp;p_year as&lt;BR /&gt;select distinct a.label, num, datepart(dt) as dt&lt;BR /&gt;from hynp.flat a&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; (*********************&amp;nbsp; data in Oracle, not sortable, 80 millions records each year)&lt;BR /&gt;inner join att.list_low5_&amp;amp;c_year b&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;(******************* records from thousands to up to 4millions,&amp;nbsp; this macro run ten circles)&lt;BR /&gt;on a.label = b.label&lt;BR /&gt;where&lt;BR /&gt;dt &amp;gt;= &amp;amp;s_date and&lt;BR /&gt;dt &amp;lt;='31Mar2019'd&lt;BR /&gt;and prov = ..&lt;BR /&gt;and tp = ..&lt;BR /&gt;and pec = ..&lt;BR /&gt;and pnum ^= ..&lt;BR /&gt;and cd &amp;lt; ..&lt;BR /&gt;and vcd not in (.....)&lt;BR /&gt;and vloc not in (....)&lt;BR /&gt;and tm not between ... and ...&lt;BR /&gt;and tm not in (...)&lt;BR /&gt;&lt;BR /&gt;and num not in (...)&lt;/P&gt;&lt;P&gt;order by a.label,dt desc, num;&lt;BR /&gt;quit;&lt;/P&gt;</description>
    <pubDate>Sun, 02 Aug 2020 07:51:00 GMT</pubDate>
    <dc:creator>pensarchem</dc:creator>
    <dc:date>2020-08-02T07:51:00Z</dc:date>
    <item>
      <title>improve proc sql join efficiency</title>
      <link>https://communities.sas.com/t5/SAS-Programming/improve-proc-sql-join-efficiency/m-p/673977#M202862</link>
      <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;Previous, when using SAS,&amp;nbsp; we can have pass through to run the query in Oracle, this process is quite smooth.&lt;/P&gt;&lt;P&gt;However, due to a special nature of safe environment, I have to run all the codes in SAS only.&amp;nbsp; &amp;nbsp;It runs incredibly slow now, even for a small set for data b.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here is the code, and I will put some labels.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I did some searches, but frankly not sure which will be the best approach like index( in this case will &amp;gt; than 34% unique labels) /hash?).&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;I tried:&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; 1) data step to extract all related data, then join; take too long and too much spaces;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; 2) use subquery, not by join.&amp;nbsp; &amp;nbsp;use filter as ---- label in (select distinct label in&amp;nbsp;att.list_low5_&amp;amp;c_year), run quite slow as well.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any recommendation from advanced users?&amp;nbsp; thank you very much.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;proc sql;&lt;BR /&gt;create table att.total_list_&amp;amp;p_year as&lt;BR /&gt;select distinct a.label, num, datepart(dt) as dt&lt;BR /&gt;from hynp.flat a&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; (*********************&amp;nbsp; data in Oracle, not sortable, 80 millions records each year)&lt;BR /&gt;inner join att.list_low5_&amp;amp;c_year b&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;(******************* records from thousands to up to 4millions,&amp;nbsp; this macro run ten circles)&lt;BR /&gt;on a.label = b.label&lt;BR /&gt;where&lt;BR /&gt;dt &amp;gt;= &amp;amp;s_date and&lt;BR /&gt;dt &amp;lt;='31Mar2019'd&lt;BR /&gt;and prov = ..&lt;BR /&gt;and tp = ..&lt;BR /&gt;and pec = ..&lt;BR /&gt;and pnum ^= ..&lt;BR /&gt;and cd &amp;lt; ..&lt;BR /&gt;and vcd not in (.....)&lt;BR /&gt;and vloc not in (....)&lt;BR /&gt;and tm not between ... and ...&lt;BR /&gt;and tm not in (...)&lt;BR /&gt;&lt;BR /&gt;and num not in (...)&lt;/P&gt;&lt;P&gt;order by a.label,dt desc, num;&lt;BR /&gt;quit;&lt;/P&gt;</description>
      <pubDate>Sun, 02 Aug 2020 07:51:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/improve-proc-sql-join-efficiency/m-p/673977#M202862</guid>
      <dc:creator>pensarchem</dc:creator>
      <dc:date>2020-08-02T07:51:00Z</dc:date>
    </item>
    <item>
      <title>Re: improve proc sql join efficiency</title>
      <link>https://communities.sas.com/t5/SAS-Programming/improve-proc-sql-join-efficiency/m-p/673979#M202864</link>
      <description>&lt;P&gt;From which table originate all the variables without an alias?&lt;/P&gt;
&lt;P&gt;Does &amp;amp;s_date change frequently?&lt;/P&gt;
&lt;P&gt;Do you have any kind of dynamics in the other parts of the WHERE?&lt;/P&gt;</description>
      <pubDate>Sun, 02 Aug 2020 09:19:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/improve-proc-sql-join-efficiency/m-p/673979#M202864</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2020-08-02T09:19:40Z</dc:date>
    </item>
    <item>
      <title>Re: improve proc sql join efficiency</title>
      <link>https://communities.sas.com/t5/SAS-Programming/improve-proc-sql-join-efficiency/m-p/673996#M202872</link>
      <description>&lt;P&gt;&lt;SPAN&gt;1, hynp.flat&amp;nbsp; is the source table;&amp;nbsp; so label, dt, num are all from the source table.&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;att.list_low5_&amp;amp;c_year b&amp;nbsp;&amp;nbsp; is a label cohort generated from hynp.flat;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;2, &amp;amp;s_date does not change and is a certain date for each macro;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;3, No, these filters are set, there is no dynamic in WHERE;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 02 Aug 2020 15:47:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/improve-proc-sql-join-efficiency/m-p/673996#M202872</guid>
      <dc:creator>pensarchem</dc:creator>
      <dc:date>2020-08-02T15:47:48Z</dc:date>
    </item>
    <item>
      <title>Re: improve proc sql join efficiency</title>
      <link>https://communities.sas.com/t5/SAS-Programming/improve-proc-sql-join-efficiency/m-p/674010#M202879</link>
      <description>&lt;P&gt;Then you should use a hash for the inner join:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data att.total_list_&amp;amp;p_year;
set hynp.flat;
where /* your where condition */;
if _n_ = 1
then do;
  declare hash l (dataset:"att.list_low5_&amp;amp;c_year (keep=label)");
  l.definekey("label");
  l.definedone();
end;
if l.check() = 0;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sun, 02 Aug 2020 20:09:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/improve-proc-sql-join-efficiency/m-p/674010#M202879</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2020-08-02T20:09:19Z</dc:date>
    </item>
    <item>
      <title>Re: improve proc sql join efficiency</title>
      <link>https://communities.sas.com/t5/SAS-Programming/improve-proc-sql-join-efficiency/m-p/674024#M202892</link>
      <description>&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN&gt;However, due to a special nature of safe environment, I have to run all the codes in SAS only&lt;/SPAN&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Can you explain this a bit further? You're already connecting to Oracle so why exactly can't you push some of the processing to Oracle?&lt;/P&gt;
&lt;P&gt;If you have to pull all the data first into SAS before reducing volumes then that's already going to impact quite a bit on performance even if your 2nd table would only contain a few rows.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 02 Aug 2020 23:09:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/improve-proc-sql-join-efficiency/m-p/674024#M202892</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2020-08-02T23:09:18Z</dc:date>
    </item>
    <item>
      <title>Re: improve proc sql join efficiency</title>
      <link>https://communities.sas.com/t5/SAS-Programming/improve-proc-sql-join-efficiency/m-p/674033#M202900</link>
      <description>&lt;P&gt;There must be something preventing SAS/SQL from asking Oracle to preprocess your query. I suspect it is the use of function datepart() .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Try adding SQL_FUNCTIONS=ALL to your libname definition of hynp. This should allow the datetime function to be passed to Oracle for processing.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If that doesn't work, write &lt;STRONG&gt;datepart(a.dt) as dt&lt;/STRONG&gt; in the select clause and use datetime constants in your where clause (e.g. &lt;STRONG&gt;.... a.dt &amp;lt;='31Mar2019:00:00:00'dt&lt;/STRONG&gt;).&lt;/P&gt;</description>
      <pubDate>Mon, 03 Aug 2020 01:28:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/improve-proc-sql-join-efficiency/m-p/674033#M202900</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2020-08-03T01:28:38Z</dc:date>
    </item>
    <item>
      <title>Re: improve proc sql join efficiency</title>
      <link>https://communities.sas.com/t5/SAS-Programming/improve-proc-sql-join-efficiency/m-p/674047#M202908</link>
      <description>&lt;P&gt;Please add this line&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;options sastrace=',,,db' sastraceloc=saslog nostsuffix;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;and show us the full log.&lt;/P&gt;</description>
      <pubDate>Mon, 03 Aug 2020 05:15:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/improve-proc-sql-join-efficiency/m-p/674047#M202908</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-08-03T05:15:21Z</dc:date>
    </item>
    <item>
      <title>Re: improve proc sql join efficiency</title>
      <link>https://communities.sas.com/t5/SAS-Programming/improve-proc-sql-join-efficiency/m-p/674054#M202913</link>
      <description>&lt;P&gt;thanks.&amp;nbsp; In the process of testing code now, I am dealing an error message.&amp;nbsp; Will get back to you.&lt;/P&gt;</description>
      <pubDate>Mon, 03 Aug 2020 05:58:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/improve-proc-sql-join-efficiency/m-p/674054#M202913</guid>
      <dc:creator>pensarchem</dc:creator>
      <dc:date>2020-08-03T05:58:47Z</dc:date>
    </item>
    <item>
      <title>Re: improve proc sql join efficiency</title>
      <link>https://communities.sas.com/t5/SAS-Programming/improve-proc-sql-join-efficiency/m-p/674055#M202914</link>
      <description>&lt;P&gt;Yes, currently I run the code on a lower security level, so I can still connect to oracle.&lt;/P&gt;&lt;P&gt;However, we do have a special safe environment and all the data were imported manually from various resources.&amp;nbsp; The data will be imported through .gz files into sas and run locally only under SAS..&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So I am testing my code now and will move to safe environment after everything runs smoothly..I hope this clarifies.&lt;/P&gt;</description>
      <pubDate>Mon, 03 Aug 2020 06:02:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/improve-proc-sql-join-efficiency/m-p/674055#M202914</guid>
      <dc:creator>pensarchem</dc:creator>
      <dc:date>2020-08-03T06:02:29Z</dc:date>
    </item>
    <item>
      <title>Re: improve proc sql join efficiency</title>
      <link>https://communities.sas.com/t5/SAS-Programming/improve-proc-sql-join-efficiency/m-p/674056#M202915</link>
      <description>&lt;P&gt;thanks for your input, I answered this above.&amp;nbsp; We normally do run the code in Oracle and the process is quite quick..&lt;/P&gt;</description>
      <pubDate>Mon, 03 Aug 2020 06:04:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/improve-proc-sql-join-efficiency/m-p/674056#M202915</guid>
      <dc:creator>pensarchem</dc:creator>
      <dc:date>2020-08-03T06:04:15Z</dc:date>
    </item>
    <item>
      <title>Re: improve proc sql join efficiency</title>
      <link>https://communities.sas.com/t5/SAS-Programming/improve-proc-sql-join-efficiency/m-p/674059#M202916</link>
      <description>&lt;P&gt;And WHY exactly can't you run it in Oracle this time? If you have first to transfer all the data into SAS before subsetting then no wonder it takes much longer. At least try and use the proposed hash table approach so the big data set doesn't need sorting as well on the SAS side before reducing the data volumes.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Aug 2020 06:37:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/improve-proc-sql-join-efficiency/m-p/674059#M202916</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2020-08-03T06:37:08Z</dc:date>
    </item>
    <item>
      <title>Re: improve proc sql join efficiency</title>
      <link>https://communities.sas.com/t5/SAS-Programming/improve-proc-sql-join-efficiency/m-p/674064#M202917</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/273559"&gt;@pensarchem&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;thanks.&amp;nbsp; In the process of testing code now, I am dealing an error message.&amp;nbsp; Will get back to you.&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Two thoughts:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;The argument about the datepart() function is valid. Using a SAS-only function will force SAS to unload data before applying the function. OTOH, using it in a data step you simply unload the datetime value as is, which should not pose a performance problem. Just make sure that you do not use SAS-specific elements in the WHERE condition, as SAS will send that to the DBMS for execution there, if it can.&lt;/LI&gt;
&lt;LI&gt;Whenever you use a connection to a DBMS, SAS will implicitly run SQL code there. Securitywise, there is no difference to using explicit pass-through: code will be sent from SAS to the DBMS. If the "safe environment" is not hardened against any mishap induced from outside (through proper permissions, performance limits, etc), then it is not safe at all. If there are any doubts, you can only have code (that was checked by the DBA) run in the DBMS to unload to a flat file, and have that sent to SAS. This is, BTW, the method we use.&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Mon, 03 Aug 2020 07:40:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/improve-proc-sql-join-efficiency/m-p/674064#M202917</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2020-08-03T07:40:22Z</dc:date>
    </item>
  </channel>
</rss>

