<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic SAS SQL Code Efficiency - Left Join in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/SAS-SQL-Code-Efficiency-Left-Join/m-p/399476#M96765</link>
    <description>&lt;P&gt;Hi All,&lt;BR /&gt;&lt;BR /&gt;I am using following code for the requirment given below, Present it is taking 5 hours to complete the following query because dataset size are huge.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Can somebody help me with efficient SAS code which can reduce processing time.&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;Proc sql;&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;create table INVENTORY as select distinct&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; a.*&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;,min(case when (b.PROCESS_DATE GT a.CREATION_DATE and c.MAKING_DATE GE b.PROCESS_DATE and c.MAKING_DATE LE b.PROCESS_DATE_5DAYS and c.AMOUNT LE 0)&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; then b.PROCESS_DATE else . end) as CLEARENCE_DATE format=date9.&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;,min(case when (b.PROCESS_DATE GT a.CREATION_DATE and c.MAKING_DATE GE b.PROCESS_DATE and c.MAKING_DATE LE b.PROCESS_DATE_5DAYS and c.AMOUNT LE 0)&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; then c.MAKING_DATE&amp;nbsp; else . end) as INV_DATE format=date9.&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;from MAIN_DATA as a&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;left join PROCESSING_DATA as b&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;lt;------(Has Record Count of "682152529")&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;on a.ID = b.ID&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;left join MAKING_DATA as c&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;lt;------(Has Record Count of "374098947")&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;on a.ID = c.ID &amp;nbsp;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;group by a.ID, a.PART_ID&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;order by a.ID, a.PART_ID;&lt;BR /&gt;&lt;BR /&gt;Quit;&lt;BR /&gt;&lt;BR /&gt;I have one main base dataset "MAIN_DATA" and I am joining my "MAIN_DATA" with two huge datasets called "PROCESSING_DATA" and "MAKING_DATA"&lt;BR /&gt;1. MAIN_DATA (This dataset has "CREATION_DATE")&lt;BR /&gt;2. PROCESSING_DATA (This dataset has "PROCESS_DATE" and "PROCESS_DATE_5Days" )&lt;BR /&gt;3. MAKING_DATA (This dataset has "MAKING_DATE" and "AMOUNT")&lt;BR /&gt;&lt;BR /&gt;1. Requirement for Identifying "Clearence Date":&lt;BR /&gt;I want to know earliest "PROCESS_DATE" date for the following:&lt;BR /&gt;"PROCESS_DATE" should be After "CREATION_DATE" and "MAKING_DATE" should be between "PROCESS_DATE" and "PROCESS_DATE_5Days" and "AMOUNT" should be ZERO(0) or less than ZERO(0)&lt;BR /&gt;&lt;BR /&gt;2. Requirement for Identifying "Inventory Date":&lt;BR /&gt;I want to know earliest "MAKING_DATE" date for the following:&lt;BR /&gt;"PROCESS_DATE" should be After "CREATION_DATE" and "MAKING_DATE" should be between "PROCESS_DATE" and "PROCESS_DATE_5Days" and "AMOUNT" should be ZERO(0) or less than ZERO(0)&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Ravi&lt;/P&gt;</description>
    <pubDate>Thu, 28 Sep 2017 13:07:41 GMT</pubDate>
    <dc:creator>Ravikumar_RT</dc:creator>
    <dc:date>2017-09-28T13:07:41Z</dc:date>
    <item>
      <title>SAS SQL Code Efficiency - Left Join</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-SQL-Code-Efficiency-Left-Join/m-p/399476#M96765</link>
      <description>&lt;P&gt;Hi All,&lt;BR /&gt;&lt;BR /&gt;I am using following code for the requirment given below, Present it is taking 5 hours to complete the following query because dataset size are huge.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Can somebody help me with efficient SAS code which can reduce processing time.&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;Proc sql;&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;create table INVENTORY as select distinct&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; a.*&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;,min(case when (b.PROCESS_DATE GT a.CREATION_DATE and c.MAKING_DATE GE b.PROCESS_DATE and c.MAKING_DATE LE b.PROCESS_DATE_5DAYS and c.AMOUNT LE 0)&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; then b.PROCESS_DATE else . end) as CLEARENCE_DATE format=date9.&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;,min(case when (b.PROCESS_DATE GT a.CREATION_DATE and c.MAKING_DATE GE b.PROCESS_DATE and c.MAKING_DATE LE b.PROCESS_DATE_5DAYS and c.AMOUNT LE 0)&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; then c.MAKING_DATE&amp;nbsp; else . end) as INV_DATE format=date9.&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;from MAIN_DATA as a&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;left join PROCESSING_DATA as b&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;lt;------(Has Record Count of "682152529")&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;on a.ID = b.ID&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;left join MAKING_DATA as c&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;lt;------(Has Record Count of "374098947")&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;on a.ID = c.ID &amp;nbsp;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;group by a.ID, a.PART_ID&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;order by a.ID, a.PART_ID;&lt;BR /&gt;&lt;BR /&gt;Quit;&lt;BR /&gt;&lt;BR /&gt;I have one main base dataset "MAIN_DATA" and I am joining my "MAIN_DATA" with two huge datasets called "PROCESSING_DATA" and "MAKING_DATA"&lt;BR /&gt;1. MAIN_DATA (This dataset has "CREATION_DATE")&lt;BR /&gt;2. PROCESSING_DATA (This dataset has "PROCESS_DATE" and "PROCESS_DATE_5Days" )&lt;BR /&gt;3. MAKING_DATA (This dataset has "MAKING_DATE" and "AMOUNT")&lt;BR /&gt;&lt;BR /&gt;1. Requirement for Identifying "Clearence Date":&lt;BR /&gt;I want to know earliest "PROCESS_DATE" date for the following:&lt;BR /&gt;"PROCESS_DATE" should be After "CREATION_DATE" and "MAKING_DATE" should be between "PROCESS_DATE" and "PROCESS_DATE_5Days" and "AMOUNT" should be ZERO(0) or less than ZERO(0)&lt;BR /&gt;&lt;BR /&gt;2. Requirement for Identifying "Inventory Date":&lt;BR /&gt;I want to know earliest "MAKING_DATE" date for the following:&lt;BR /&gt;"PROCESS_DATE" should be After "CREATION_DATE" and "MAKING_DATE" should be between "PROCESS_DATE" and "PROCESS_DATE_5Days" and "AMOUNT" should be ZERO(0) or less than ZERO(0)&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Ravi&lt;/P&gt;</description>
      <pubDate>Thu, 28 Sep 2017 13:07:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-SQL-Code-Efficiency-Left-Join/m-p/399476#M96765</guid>
      <dc:creator>Ravikumar_RT</dc:creator>
      <dc:date>2017-09-28T13:07:41Z</dc:date>
    </item>
    <item>
      <title>Re: SAS SQL Code Efficiency - Left Join</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-SQL-Code-Efficiency-Left-Join/m-p/399490#M96766</link>
      <description>&lt;P&gt;Using select distinct on a long list of variables is usually a VERY BAD IDEA, as it forces SQL to sort by ALL variables to remove doubles.&lt;/P&gt;
&lt;P&gt;Your first task shall therefore be to determine if that distinct is necessary, and how to work around it if it is (eg if only a subgroup of variables contains the necessary values to determine the distinct, you can do a sort nodupkey using that).&lt;/P&gt;
&lt;P&gt;Next you need to inspect your relationships. If you have a 1:n relationship, you can replace that join with a sort and data step merge, which usually outperforms SQL up to orders of magnitude.&lt;/P&gt;</description>
      <pubDate>Thu, 28 Sep 2017 13:45:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-SQL-Code-Efficiency-Left-Join/m-p/399490#M96766</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2017-09-28T13:45:59Z</dc:date>
    </item>
    <item>
      <title>Re: SAS SQL Code Efficiency - Left Join</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-SQL-Code-Efficiency-Left-Join/m-p/399712#M96832</link>
      <description>&lt;P&gt;How large is your MAIN_DATA table? If it has significantly fewer rows than the other two tables, you may get better performance by putting indexes on the large tables:&lt;/P&gt;&lt;PRE&gt;proc sql;
  create index ID on PROCESSING_DATA(ID);
  create index ID on MAKING_DATA(ID);
quit;&lt;/PRE&gt;&lt;P&gt;Of course, the feasibility and efficiency of this also&amp;nbsp;depends on how often (and how) you update the two large tables. But often the creation of an index takes less time than what you can save in a single query.&lt;/P&gt;</description>
      <pubDate>Fri, 29 Sep 2017 07:40:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-SQL-Code-Efficiency-Left-Join/m-p/399712#M96832</guid>
      <dc:creator>s_lassen</dc:creator>
      <dc:date>2017-09-29T07:40:36Z</dc:date>
    </item>
    <item>
      <title>Re: SAS SQL Code Efficiency - Left Join</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-SQL-Code-Efficiency-Left-Join/m-p/399830#M96862</link>
      <description>Hi,&lt;BR /&gt;&lt;BR /&gt;Thanks for your suggestion, I did tried processing by creating index, but still it took more than 5 hours to complete.&lt;BR /&gt;&lt;BR /&gt;MAIN_DATA - "1249382" (Total Record count)&lt;BR /&gt;PROCESSING_DATA - "682152529" (Total Record count)&lt;BR /&gt;MAKING_DATA - "374098947" (Total Record count)&lt;BR /&gt;&lt;BR /&gt;Please could you help me with any alternative SAS code which helps in reducing processing time.&lt;BR /&gt;&lt;BR /&gt;My requirement is same:&lt;BR /&gt;&lt;BR /&gt;I have one main base dataset "MAIN_DATA" and I am joining my "MAIN_DATA" with two huge datasets called "PROCESSING_DATA" and "MAKING_DATA"&lt;BR /&gt;1. MAIN_DATA (This dataset has "CREATION_DATE")&lt;BR /&gt;2. PROCESSING_DATA (This dataset has "PROCESS_DATE" and "PROCESS_DATE_5Days" )&lt;BR /&gt;3. MAKING_DATA (This dataset has "MAKING_DATE" and "AMOUNT")&lt;BR /&gt;&lt;BR /&gt;1. Requirement for Identifying "Clearence Date":&lt;BR /&gt;I want to know earliest "PROCESS_DATE" date for the following:&lt;BR /&gt;"PROCESS_DATE" should be After "CREATION_DATE" and "MAKING_DATE" should be between "PROCESS_DATE" and "PROCESS_DATE_5Days" and "AMOUNT" should be ZERO(0) or less than ZERO(0)&lt;BR /&gt;&lt;BR /&gt;2. Requirement for Identifying "Inventory Date":&lt;BR /&gt;I want to know earliest "MAKING_DATE" date for the following:&lt;BR /&gt;"PROCESS_DATE" should be After "CREATION_DATE" and "MAKING_DATE" should be between "PROCESS_DATE" and "PROCESS_DATE_5Days" and "AMOUNT" should be ZERO(0) or less than ZERO(0)&lt;BR /&gt;&lt;BR /&gt;Once again thanks alot for your help.&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Ravi&lt;BR /&gt;</description>
      <pubDate>Fri, 29 Sep 2017 15:29:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-SQL-Code-Efficiency-Left-Join/m-p/399830#M96862</guid>
      <dc:creator>Ravikumar_RT</dc:creator>
      <dc:date>2017-09-29T15:29:46Z</dc:date>
    </item>
  </channel>
</rss>

