<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Left Join or Merge of sas tables optimization in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/Left-Join-or-Merge-of-sas-tables-optimization/m-p/595030#M76109</link>
    <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have below query which is taking lot of time&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;proc sql;&lt;BR /&gt;create table agent1&lt;BR /&gt;as&lt;BR /&gt;select a.*,&lt;BR /&gt;B.DEAL_BRANCH_CITY,&lt;BR /&gt;B.DEAL_BRANCH_KEY,&lt;BR /&gt;B.DEAL_BRANCH_NAME,&lt;BR /&gt;B.DEAL_BRANCH_STATE,&lt;BR /&gt;B.DEAL_PRIMARY_SUBVERTICAL_KEY,&lt;BR /&gt;B.DEAL_PRIMARY_VERTICAL_KEY,&lt;BR /&gt;B.DEAL_PRI_SSUB_VERTICAL_DESC,&lt;BR /&gt;B.DEAL_PRI_VVERTICAL_DESC,&lt;BR /&gt;B.DEAL_RM_CODE,&lt;BR /&gt;B.DEAL_RM_NAME,&lt;BR /&gt;B.DEAL_SEC_RM_CODE,&lt;BR /&gt;B.DEAL_SEC_RM_NAME,&lt;BR /&gt;B.DEAL_SEC_SSUB_VERTICAL_DESC,&lt;BR /&gt;B.DEAL_SEC_SUBVERTICAL_KEY,&lt;BR /&gt;B.DEAL_SEC_VERTICAL_KEY,&lt;BR /&gt;B.DEAL_SEC_VVERTICAL_DESC,&lt;BR /&gt;B.POL_DEAL_DSA_NAME,&lt;BR /&gt;B.POL_DEAL_NUM&lt;BR /&gt;FROM&lt;BR /&gt;work.agent as a&lt;BR /&gt;LEFT JOIN&lt;BR /&gt;STAGE.master_LOOKUP AS B&lt;BR /&gt;ON&lt;BR /&gt;UPCASE(A.POLICY_NUMBER)=UPCASE(B.POL_NUM)&lt;BR /&gt;;&lt;BR /&gt;quit;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here, Agent table has 8731382 records and no duplicates It has 36 variables.&amp;nbsp; My B table is pain it has 167171496 and has duplicates on the key column and has total 29 columns. Both of these tables are SAS tables and have indexes on respective columns. I tried with HASH join but it still takes huge time.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;How can I reduce the execution time for this ? Any suggestion is highly appreciated&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 09 Oct 2019 12:48:19 GMT</pubDate>
    <dc:creator>yashpande</dc:creator>
    <dc:date>2019-10-09T12:48:19Z</dc:date>
    <item>
      <title>Left Join or Merge of sas tables optimization</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Left-Join-or-Merge-of-sas-tables-optimization/m-p/595030#M76109</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have below query which is taking lot of time&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;proc sql;&lt;BR /&gt;create table agent1&lt;BR /&gt;as&lt;BR /&gt;select a.*,&lt;BR /&gt;B.DEAL_BRANCH_CITY,&lt;BR /&gt;B.DEAL_BRANCH_KEY,&lt;BR /&gt;B.DEAL_BRANCH_NAME,&lt;BR /&gt;B.DEAL_BRANCH_STATE,&lt;BR /&gt;B.DEAL_PRIMARY_SUBVERTICAL_KEY,&lt;BR /&gt;B.DEAL_PRIMARY_VERTICAL_KEY,&lt;BR /&gt;B.DEAL_PRI_SSUB_VERTICAL_DESC,&lt;BR /&gt;B.DEAL_PRI_VVERTICAL_DESC,&lt;BR /&gt;B.DEAL_RM_CODE,&lt;BR /&gt;B.DEAL_RM_NAME,&lt;BR /&gt;B.DEAL_SEC_RM_CODE,&lt;BR /&gt;B.DEAL_SEC_RM_NAME,&lt;BR /&gt;B.DEAL_SEC_SSUB_VERTICAL_DESC,&lt;BR /&gt;B.DEAL_SEC_SUBVERTICAL_KEY,&lt;BR /&gt;B.DEAL_SEC_VERTICAL_KEY,&lt;BR /&gt;B.DEAL_SEC_VVERTICAL_DESC,&lt;BR /&gt;B.POL_DEAL_DSA_NAME,&lt;BR /&gt;B.POL_DEAL_NUM&lt;BR /&gt;FROM&lt;BR /&gt;work.agent as a&lt;BR /&gt;LEFT JOIN&lt;BR /&gt;STAGE.master_LOOKUP AS B&lt;BR /&gt;ON&lt;BR /&gt;UPCASE(A.POLICY_NUMBER)=UPCASE(B.POL_NUM)&lt;BR /&gt;;&lt;BR /&gt;quit;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here, Agent table has 8731382 records and no duplicates It has 36 variables.&amp;nbsp; My B table is pain it has 167171496 and has duplicates on the key column and has total 29 columns. Both of these tables are SAS tables and have indexes on respective columns. I tried with HASH join but it still takes huge time.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;How can I reduce the execution time for this ? Any suggestion is highly appreciated&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 09 Oct 2019 12:48:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Left-Join-or-Merge-of-sas-tables-optimization/m-p/595030#M76109</guid>
      <dc:creator>yashpande</dc:creator>
      <dc:date>2019-10-09T12:48:19Z</dc:date>
    </item>
    <item>
      <title>Re: Left Join or Merge of sas tables optimization</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Left-Join-or-Merge-of-sas-tables-optimization/m-p/595041#M76110</link>
      <description>&lt;PRE&gt; 

Here, Agent table has 8731382 records and no duplicates It has 36 variables.  My B table is pain it has 167171496 and has duplicates on the key column and has total 29 columns. Both of these tables are SAS tables and have indexes on respective columns. I tried with HASH join but it still takes huge time.

 &lt;/PRE&gt;
&lt;P&gt;Do you want the output to have 36 + 167 variables? What is length of POLICY_NUMBER&amp;nbsp; ? From the use of UPCASE() I understand that POLICY_Number is not digital string.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 09 Oct 2019 13:14:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Left-Join-or-Merge-of-sas-tables-optimization/m-p/595041#M76110</guid>
      <dc:creator>KachiM</dc:creator>
      <dc:date>2019-10-09T13:14:46Z</dc:date>
    </item>
    <item>
      <title>Re: Left Join or Merge of sas tables optimization</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Left-Join-or-Merge-of-sas-tables-optimization/m-p/595054#M76111</link>
      <description>&lt;P&gt;I want 36 columns from Agent table and 18 columns from B table. Length of POLICY_NUMBER is 200.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 09 Oct 2019 13:48:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Left-Join-or-Merge-of-sas-tables-optimization/m-p/595054#M76111</guid>
      <dc:creator>yashpande</dc:creator>
      <dc:date>2019-10-09T13:48:55Z</dc:date>
    </item>
    <item>
      <title>Re: Left Join or Merge of sas tables optimization</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Left-Join-or-Merge-of-sas-tables-optimization/m-p/595058#M76112</link>
      <description>&lt;P&gt;Your indexes do not help because of the use of functions in the join condition. Indexes work with the raw values of their columns.&lt;/P&gt;
&lt;P&gt;Things to do:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;create unambiguous columns for the keys (e.g. all uppercase)&lt;/LI&gt;
&lt;LI&gt;sort by (or create indexes for) these columns&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Then you can rerun your join, using either method you already tried.&lt;/P&gt;
&lt;P&gt;Since you have a one-to-many join, you can use a data step merge if both tables are sorted accordingly.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Alternatively, use the table without duplicates in a hash. Bringing the keys to all uppercase will also be necessary for this.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Keep in mind that reading 167 million observation as such is not done in seconds. What kind of library is STAGE? If it is a connection to a remote database, this may be your biggest bottleneck.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 09 Oct 2019 13:59:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Left-Join-or-Merge-of-sas-tables-optimization/m-p/595058#M76112</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2019-10-09T13:59:00Z</dc:date>
    </item>
    <item>
      <title>Re: Left Join or Merge of sas tables optimization</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Left-Join-or-Merge-of-sas-tables-optimization/m-p/595059#M76113</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/23883"&gt;@yashpande&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Length of POLICY_NUMBER is 200.&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Check if that is actually the case in your original source data. I work for an insurance company, and our policy number column is just 10 bytes.&lt;/P&gt;
&lt;P&gt;Since 200 is the default length that SAS uses in unclear cases, you might accidentally have introduced a mistake there by not setting a length explicitly when using a character function..&lt;/P&gt;</description>
      <pubDate>Wed, 09 Oct 2019 14:03:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Left-Join-or-Merge-of-sas-tables-optimization/m-p/595059#M76113</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2019-10-09T14:03:24Z</dc:date>
    </item>
    <item>
      <title>Re: Left Join or Merge of sas tables optimization</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Left-Join-or-Merge-of-sas-tables-optimization/m-p/595062#M76114</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/23883"&gt;@yashpande&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I consider loading data from Agent as described below into a hash table seems promising.&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/11562"&gt;@Kurt_Bremser&lt;/a&gt;&amp;nbsp;suggested other ways including Datastep Merge. Merge requires sorting of both the data sets which in your circumstance is very expensive.&lt;/P&gt;
&lt;P&gt;The length of 200 is too large to go into a hash table. If you are sure of 200 bytes for Policy_Number, you can use md5() function to bring it down to 16 bytes to save memory space for KEY-Part. Further, you can use Observation number (Record ID) of Agent as DATA-Part of Hash table. When you find a match with your LOOKUP Data set, you can recover your 36 columns of Agent by using POINT= option of Set statement.&lt;/P&gt;
&lt;P&gt;It is a challenging problem.&lt;/P&gt;</description>
      <pubDate>Wed, 09 Oct 2019 14:29:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Left-Join-or-Merge-of-sas-tables-optimization/m-p/595062#M76114</guid>
      <dc:creator>KachiM</dc:creator>
      <dc:date>2019-10-09T14:29:28Z</dc:date>
    </item>
  </channel>
</rss>

