<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Optimization of proc sql to sas hashing in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Optimization-of-proc-sql-to-sas-hashing/m-p/918026#M361635</link>
    <description>&lt;P&gt;For all optimization efforts, you need to know what the bottleneck is.&lt;/P&gt;
&lt;P&gt;Attach the following to your code:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;options msglevel=i stimer;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;For PROC SQL, you can get more information how the internal optimizer work by:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;PROC SQL _method _tree;&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Tue, 27 Feb 2024 12:46:57 GMT</pubDate>
    <dc:creator>LinusH</dc:creator>
    <dc:date>2024-02-27T12:46:57Z</dc:date>
    <item>
      <title>Optimization of proc sql to sas hashing</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Optimization-of-proc-sql-to-sas-hashing/m-p/918022#M361631</link>
      <description>&lt;P&gt;&lt;SPAN&gt;I am attempting to merge two datasets. The first dataset, named "Maturity_Gap_Master_2," contains 5 million records, while the second dataset, named "want," contains 500 records. The merge operation is based on a condition specified below:&lt;BR /&gt;&lt;BR /&gt;"""&lt;BR /&gt;proc sql;&lt;BR /&gt;create table output as&lt;BR /&gt;select *&lt;BR /&gt;from Maturity_Gap_Master_2 a&lt;BR /&gt;left join&lt;BR /&gt;want b&lt;BR /&gt;on a.ACC_NEXT_FIXING_REM_TENOR_IRRBB &amp;gt;= b.Range_Start&lt;BR /&gt;and a.ACC_NEXT_FIXING_REM_TENOR_IRRBB &amp;lt;= b.Range_end;&lt;BR /&gt;"""&lt;BR /&gt;However, the above query takes at least 10 minutes to execute. To optimize the process, I attempted using SAS hashing:&lt;BR /&gt;"""&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;data output;&lt;BR /&gt;if _n_=1 then do;&lt;BR /&gt;if 0 then set want;&lt;BR /&gt;declare hash h (dataset:'want');&lt;BR /&gt;h.definekey('Range_start','Range_End');&lt;BR /&gt;h.definedata(all:'Y');&lt;BR /&gt;h.definedone();&lt;BR /&gt;declare hiter hi ('h');&lt;BR /&gt;end;&lt;/P&gt;&lt;P&gt;set MATURITY_GAP_MASTER_2;&lt;/P&gt;&lt;P&gt;rc=hi.last();&lt;BR /&gt;start_loop = 1;&lt;BR /&gt;do while(start_loop);&lt;BR /&gt;if ACC_NEXT_FIXING_REM_TENOR_IRRBB &amp;gt;= Range_Start and ACC_NEXT_FIXING_REM_TENOR_IRRBB &amp;lt;= Range_end and rc=0 then start_loop = 0;&lt;BR /&gt;else rc=hi.prev();&lt;BR /&gt;end;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;"""&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Despite this optimization attempt, the execution time has not decreased. Could anyone provide assistance in optimizing the query further?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Feb 2024 12:18:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Optimization-of-proc-sql-to-sas-hashing/m-p/918022#M361631</guid>
      <dc:creator>Anshul2</dc:creator>
      <dc:date>2024-02-27T12:18:44Z</dc:date>
    </item>
    <item>
      <title>Re: Optimization of proc sql to sas hashing</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Optimization-of-proc-sql-to-sas-hashing/m-p/918024#M361633</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/464088"&gt;@Anshul2&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I would recommend expanding the 500 records data set into a slightly larger table, by explicitly specifying the values of the ranges.&lt;/P&gt;
&lt;P&gt;i.e. Instead of just having &lt;SPAN&gt;Range_Start &amp;amp;&lt;/SPAN&gt;&amp;nbsp;&lt;SPAN&gt;Range_&lt;/SPAN&gt;End columns, Add a third column of the actual value, then try to do exact match from your 5 M large table to the new expanded lookup (want) table.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This way, it's a one time range expansion, rather than 5 million times, as you have been doing via your Left Join and Hash Join.&lt;/P&gt;
&lt;P&gt;Hope this helps,&lt;/P&gt;
&lt;P&gt;Ahmed&lt;/P&gt;</description>
      <pubDate>Tue, 27 Feb 2024 12:44:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Optimization-of-proc-sql-to-sas-hashing/m-p/918024#M361633</guid>
      <dc:creator>AhmedAl_Attar</dc:creator>
      <dc:date>2024-02-27T12:44:10Z</dc:date>
    </item>
    <item>
      <title>Re: Optimization of proc sql to sas hashing</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Optimization-of-proc-sql-to-sas-hashing/m-p/918026#M361635</link>
      <description>&lt;P&gt;For all optimization efforts, you need to know what the bottleneck is.&lt;/P&gt;
&lt;P&gt;Attach the following to your code:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;options msglevel=i stimer;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;For PROC SQL, you can get more information how the internal optimizer work by:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;PROC SQL _method _tree;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 27 Feb 2024 12:46:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Optimization-of-proc-sql-to-sas-hashing/m-p/918026#M361635</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2024-02-27T12:46:57Z</dc:date>
    </item>
    <item>
      <title>Re: Optimization of proc sql to sas hashing</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Optimization-of-proc-sql-to-sas-hashing/m-p/918027#M361636</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/464088"&gt;@Anshul2&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here is an example code&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
	length Range_Start Range_End 8;
	Range_Start = '01Jan2023'd; Range_End = '31Jan2023'd; OUTPUT;
	Range_Start = '01Feb2023'd; Range_End = '15Feb2023'd; OUTPUT;
	Range_Start = '01Jun2023'd; Range_End = '30Jul2023'd; OUTPUT;
	FORMAT Range_Start Range_End date9.;
RUN;
data want_exp(KEEP=date);
	Set want;
	do i=Range_Start to Range_End;
		date = i;
		output;
	end;
	FORMAT date date9.;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Use the "want_exp" table in your joins (SQL/Hash) with your 5 Millions record using equi joins (=) rather than Between (&amp;gt;= &amp;amp; &amp;lt;=)&lt;/P&gt;</description>
      <pubDate>Tue, 27 Feb 2024 12:57:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Optimization-of-proc-sql-to-sas-hashing/m-p/918027#M361636</guid>
      <dc:creator>AhmedAl_Attar</dc:creator>
      <dc:date>2024-02-27T12:57:09Z</dc:date>
    </item>
    <item>
      <title>Re: Optimization of proc sql to sas hashing</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Optimization-of-proc-sql-to-sas-hashing/m-p/918029#M361638</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/464088"&gt;@Anshul2&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If it's not feasible to implement&amp;nbsp;&lt;A href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13868" target="_blank" rel="noopener"&gt;AhmedAl_Attar&lt;/A&gt;&amp;nbsp;'s great suggestion of using an equijoin (see also&amp;nbsp;&lt;A href="https://communities.sas.com/t5/SAS-Programming/How-to-improve-Efficiency/m-p/551999/highlight/true#M153442" target="_blank" rel="noopener"&gt;Re: How to improve Efficiency&lt;/A&gt; for another example) because the number of potential &lt;SPAN&gt;ACC_NEXT_FIXING_REM_TENOR_IRRBB&amp;nbsp;&lt;/SPAN&gt;values in those ranges is too large, you could consider defining one or more &lt;A href="https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/p1upn25lbfo6mkn1wncu4dyh9q91.htm" target="_blank" rel="noopener"&gt;formats&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here's an example with non-overlapping ranges (1 through 10, &amp;gt;10 through 100, &amp;gt;100 through 1000) and only a few (2) "satellite" variables in dataset WANT:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Create sample data for demonstration */

data want;
input Range_Start Range_End var1 var2 $;
cards;
1     10 123 first
10   100 456 second
100 1000 789 third
;

/* Create a format assigning concatenated values to the ranges */

data wantfmt;
retain fmtname 'wantfmt';
set want(rename=(range_start=start range_end=end)) end=last;
length label $30;
label=catx('|',var1,var2);
output;
if last then do;
  hlo='O';
  label=' ';
  output;
end;
run;

proc format cntlin=wantfmt;
run;

/* Perform the intended left join by applying the format */

data final(drop=_c);
set Maturity_Gap_Master_2;
length _c $30;
_c=put(ACC_NEXT_FIXING_REM_TENOR_IRRBB,wantfmt.);
var1=input(scan(_c,1,'|'),32.);
length var2 $8;
var2=scan(_c,2,'|');
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 27 Feb 2024 13:42:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Optimization-of-proc-sql-to-sas-hashing/m-p/918029#M361638</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2024-02-27T13:42:21Z</dc:date>
    </item>
    <item>
      <title>Re: Optimization of proc sql to sas hashing</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Optimization-of-proc-sql-to-sas-hashing/m-p/918149#M361673</link>
      <description>&lt;P&gt;Questions:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Are the range_start/range_end time spans in dataset WANT mutually exclusive?&lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;
&lt;LI&gt;What is the granularity of&amp;nbsp;&lt;SPAN&gt;ACC_NEXT_FIXING_REM_TENOR_IRRBB?&amp;nbsp; &amp;nbsp;I.e. are they date values?&amp;nbsp; Time values?&amp;nbsp; &amp;nbsp;Something else?&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;If the answer to question 1 ("mutually exclusive") is yes, then are there gaps &lt;U&gt;&lt;EM&gt;&lt;STRONG&gt;between&lt;/STRONG&gt;&lt;/EM&gt;&lt;/U&gt; the ranges?&lt;BR /&gt;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/OL&gt;</description>
      <pubDate>Wed, 28 Feb 2024 05:06:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Optimization-of-proc-sql-to-sas-hashing/m-p/918149#M361673</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2024-02-28T05:06:05Z</dc:date>
    </item>
    <item>
      <title>Re: Optimization of proc sql to sas hashing</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Optimization-of-proc-sql-to-sas-hashing/m-p/918159#M361676</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/464088"&gt;@Anshul2&lt;/a&gt;&amp;nbsp;Is this a many:many join meaning that more than one row from your table want could meet the condition for a row from table&amp;nbsp;&lt;SPAN&gt;Maturity_Gap_Master_2?&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;To give us some indication about the data volume can you please share the output of a proc contents over your result table output that your SQL creates?&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc contents data=work.output;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN&gt;What I'm most interested in is below highlighted info.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Patrick_0-1709109180996.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/94179i14D7F54058C25448/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Patrick_0-1709109180996.png" alt="Patrick_0-1709109180996.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;Is the number of observation the same or higher than what&amp;nbsp; you've got in your source table&amp;nbsp;&lt;SPAN&gt;Maturity_Gap_Master_2?&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 28 Feb 2024 08:36:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Optimization-of-proc-sql-to-sas-hashing/m-p/918159#M361676</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2024-02-28T08:36:49Z</dc:date>
    </item>
    <item>
      <title>Re: Optimization of proc sql to sas hashing</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Optimization-of-proc-sql-to-sas-hashing/m-p/918231#M361697</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/464088"&gt;@Anshul2&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;The SAS SQL optimizer selects the best option for join.&lt;BR /&gt;In your case I am sure it selects the hash join algorithm. You can verify running the Proc SQL with _method_ option.&lt;BR /&gt;Your log will show that it is using hash join I am sure it will.&lt;BR /&gt;I see you are using "select *". Review if you actually need all columns. You can focus on other approaches to review your query.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 28 Feb 2024 15:18:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Optimization-of-proc-sql-to-sas-hashing/m-p/918231#M361697</guid>
      <dc:creator>Sajid01</dc:creator>
      <dc:date>2024-02-28T15:18:54Z</dc:date>
    </item>
  </channel>
</rss>

