<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to speed up proc sql when joining a large datset to a small dataset? in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/How-to-speed-up-proc-sql-when-joining-a-large-datset-to-a-small/m-p/440474#M69213</link>
    <description>&lt;P&gt;1) As&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13674"&gt;@LinusH&lt;/a&gt;&amp;nbsp;said , make a index on the large table is a good choice.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;create index securityid on BigTable;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;2)Hash Table&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 27 Feb 2018 12:31:42 GMT</pubDate>
    <dc:creator>Ksharp</dc:creator>
    <dc:date>2018-02-27T12:31:42Z</dc:date>
    <item>
      <title>How to speed up proc sql when joining a large datset to a small dataset?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-speed-up-proc-sql-when-joining-a-large-datset-to-a-small/m-p/440399#M69210</link>
      <description>&lt;P&gt;I am running the following sql-step:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;STRONG&gt;proc&lt;/STRONG&gt;&lt;/FONT&gt; &lt;STRONG&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;sql&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&amp;nbsp;&amp;nbsp; create&lt;/FONT&gt; &lt;FONT color="#0000ff" face="Courier New" size="3"&gt;table&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&amp;nbsp;id_map &lt;/FONT&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;as&lt;/FONT&gt; &lt;FONT color="#0000ff" face="Courier New" size="3"&gt;select&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; a.securityId&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ,b.name&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ,b.companyId&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;from&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; eq_securityIds &lt;/FONT&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;as&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; a&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&amp;nbsp;&amp;nbsp; left&lt;/FONT&gt; &lt;FONT color="#0000ff" face="Courier New" size="3"&gt;join&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; FOUNDATION_SECURITY &lt;/FONT&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;as&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; b &lt;/FONT&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;on&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; a.securityId=b.securityId &lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;STRONG&gt;quit&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;where the dataset eq_securityId consists of ~16000 observations (here denoted a small dataset)&amp;nbsp;and the dataset FOUNDATION_SECURITY consist of ~60 000 000 observations (here denoted a large dataset).&lt;/P&gt;&lt;P&gt;As of now the&amp;nbsp;runtime to execute this command is about 10min, is there anything I can do to speed up the sql-step? e.g, is it in general faster to join a small dataset on to a large dataset instead of me joining the large dataset onto the small dataset? could some kind of indexation of the large dataset help me speeding up the computations?&lt;/P&gt;&lt;P&gt;I noted&amp;nbsp;that the command ran a lot faster (~3min)&amp;nbsp;when I&amp;nbsp;excluded the variable b.name in the same sql-step, is this a general result or was it just a fluke?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;I run SAS 9.4 (in enterprise guide 7.1)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Feb 2018 07:51:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-speed-up-proc-sql-when-joining-a-large-datset-to-a-small/m-p/440399#M69210</guid>
      <dc:creator>metuzalem</dc:creator>
      <dc:date>2018-02-27T07:51:02Z</dc:date>
    </item>
    <item>
      <title>Re: How to speed up proc sql when joining a large datset to a small dataset?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-speed-up-proc-sql-when-joining-a-large-datset-to-a-small/m-p/440400#M69211</link>
      <description>&lt;P&gt;Keep both datasets sorted by securityid, and use a data step merge for the selection of your subset.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Alternatively, create a format from eq_securityIds where all contained securityid keys get a label of 'yes' and the OTHER value a 'no'. Then you can use the format in a subsetting if in a data step.&lt;/P&gt;</description>
      <pubDate>Tue, 27 Feb 2018 08:02:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-speed-up-proc-sql-when-joining-a-large-datset-to-a-small/m-p/440400#M69211</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2018-02-27T08:02:42Z</dc:date>
    </item>
    <item>
      <title>Re: How to speed up proc sql when joining a large datset to a small dataset?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-speed-up-proc-sql-when-joining-a-large-datset-to-a-small/m-p/440420#M69212</link>
      <description>&lt;P&gt;Indexing the large table is a good idea, especially if it's used in other use cases.&lt;/P&gt;
&lt;P&gt;This would might trigger the hash join method in SQL, which is quite efficient (use _method PROC SQL option).&lt;/P&gt;</description>
      <pubDate>Tue, 27 Feb 2018 10:00:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-speed-up-proc-sql-when-joining-a-large-datset-to-a-small/m-p/440420#M69212</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2018-02-27T10:00:43Z</dc:date>
    </item>
    <item>
      <title>Re: How to speed up proc sql when joining a large datset to a small dataset?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-speed-up-proc-sql-when-joining-a-large-datset-to-a-small/m-p/440474#M69213</link>
      <description>&lt;P&gt;1) As&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13674"&gt;@LinusH&lt;/a&gt;&amp;nbsp;said , make a index on the large table is a good choice.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;create index securityid on BigTable;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;2)Hash Table&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Feb 2018 12:31:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-speed-up-proc-sql-when-joining-a-large-datset-to-a-small/m-p/440474#M69213</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2018-02-27T12:31:42Z</dc:date>
    </item>
    <item>
      <title>Re: How to speed up proc sql when joining a large datset to a small dataset?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/How-to-speed-up-proc-sql-when-joining-a-large-datset-to-a-small/m-p/440501#M69215</link>
      <description>&lt;P&gt;Thx!&lt;/P&gt;&lt;P&gt;by running the command&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;STRONG&gt;proc&lt;/STRONG&gt;&lt;/FONT&gt; &lt;STRONG&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;sql&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; create&lt;/FONT&gt; &lt;FONT color="#0000ff" face="Courier New" size="3"&gt;index&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; securityId &lt;/FONT&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;on&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; FOUNDATION_SECURITY;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;STRONG&gt;quit&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;the sql-step is executed a lot faster. It takes about 2min to create the index but since&amp;nbsp;I am running similar join-queries further down the in the program with the large dataset then this helped a lot. Thx!&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="3"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Feb 2018 13:46:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/How-to-speed-up-proc-sql-when-joining-a-large-datset-to-a-small/m-p/440501#M69215</guid>
      <dc:creator>metuzalem</dc:creator>
      <dc:date>2018-02-27T13:46:14Z</dc:date>
    </item>
  </channel>
</rss>

