<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: proc sql join efficiency in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/proc-sql-join-efficiency/m-p/209654#M51953</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I would index prop_zip_code.&lt;/P&gt;&lt;P&gt;Use options msglevel =i; to verify it's being used.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Fri, 28 Aug 2015 15:11:54 GMT</pubDate>
    <dc:creator>LinusH</dc:creator>
    <dc:date>2015-08-28T15:11:54Z</dc:date>
    <item>
      <title>proc sql join efficiency</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/proc-sql-join-efficiency/m-p/209653#M51952</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Good Morning All,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have the following code:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sql;&lt;/P&gt;&lt;P&gt;create table fourth_run as&lt;/P&gt;&lt;P&gt;select b.*&lt;/P&gt;&lt;P&gt;from mdj.zip5_august a left join&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; mdj.addr_100m_zip9_zip5 b on&lt;/P&gt;&lt;P&gt;a.zip5 = substr(b.prop_zip_code,1,5);&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;zip5_august has about 900 records, 1 variable.&amp;nbsp; addr_100m_zip9_zip5 has about 1 million records with 70 variables.&amp;nbsp; I want to make sure that the way I'm doing this is best practice. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is it best practice to have the smaller dataset listed first?&amp;nbsp; Both are sorted accordingly, will the substr() cause much more processing time?&amp;nbsp; Would it be better to set up another field?&amp;nbsp; It's a monthly file that doesn't get used much.&amp;nbsp; I'd prefer not to make too many changes as this server is fairly bogged down already.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Any input or suggestions are welcome.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Go Pirates.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 28 Aug 2015 14:35:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/proc-sql-join-efficiency/m-p/209653#M51952</guid>
      <dc:creator>Steelers_In_DC</dc:creator>
      <dc:date>2015-08-28T14:35:07Z</dc:date>
    </item>
    <item>
      <title>Re: proc sql join efficiency</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/proc-sql-join-efficiency/m-p/209654#M51953</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I would index prop_zip_code.&lt;/P&gt;&lt;P&gt;Use options msglevel =i; to verify it's being used.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 28 Aug 2015 15:11:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/proc-sql-join-efficiency/m-p/209654#M51953</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2015-08-28T15:11:54Z</dc:date>
    </item>
    <item>
      <title>Re: proc sql join efficiency</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/proc-sql-join-efficiency/m-p/209655#M51954</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I'll have to look that up.&amp;nbsp; I'm not familiar with msglevel = i.&amp;nbsp; I ran a small subset of each dataset and see this:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;NOTE: SAS threaded sort was used.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I'm not sure what that means.&amp;nbsp; Is an index necessary if it is sorted on prop_zip_code?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 28 Aug 2015 15:49:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/proc-sql-join-efficiency/m-p/209655#M51954</guid>
      <dc:creator>Steelers_In_DC</dc:creator>
      <dc:date>2015-08-28T15:49:31Z</dc:date>
    </item>
    <item>
      <title>Re: proc sql join efficiency</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/proc-sql-join-efficiency/m-p/209656#M51955</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Not necessary, but you asked for optimzation, and an indexed join will usually perform better than a sort/merge join. Especially when the ratio of hits/total rows is as low as in your example.&lt;/P&gt;&lt;P&gt;The message means that SAS could use multi threading (that sort is done in parallel using multiple cores/CPUs). Good, but not surprising.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 28 Aug 2015 16:06:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/proc-sql-join-efficiency/m-p/209656#M51955</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2015-08-28T16:06:59Z</dc:date>
    </item>
    <item>
      <title>Re: proc sql join efficiency</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/proc-sql-join-efficiency/m-p/209657#M51956</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Excellent.&amp;nbsp; Thanks!&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 28 Aug 2015 16:27:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/proc-sql-join-efficiency/m-p/209657#M51956</guid>
      <dc:creator>Steelers_In_DC</dc:creator>
      <dc:date>2015-08-28T16:27:26Z</dc:date>
    </item>
  </channel>
</rss>

