<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: joining strings in proc sql why so inefficient? in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/joining-strings-in-proc-sql-why-so-inefficient/m-p/793271#M254231</link>
    <description>&lt;P&gt;It is a large data set, sorting it will also take a long time, but I'll give it a try.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sat, 29 Jan 2022 08:13:13 GMT</pubDate>
    <dc:creator>telligent</dc:creator>
    <dc:date>2022-01-29T08:13:13Z</dc:date>
    <item>
      <title>joining strings in proc sql why so inefficient?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/joining-strings-in-proc-sql-why-so-inefficient/m-p/793250#M254221</link>
      <description>&lt;P&gt;I am trying to join two large datasets on a long string variable and it is extremely inefficient.&amp;nbsp; It runs forever and never finishes.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I was hoping you could give me some suggestions to make this query more efficient.&amp;nbsp; Thank you.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;select *&lt;/P&gt;
&lt;P&gt;from xxxxx.xxxxx &lt;/P&gt;
&lt;P&gt;inner join xxxx.xxxx b&lt;/P&gt;
&lt;P&gt;on a.name=b.name&lt;/P&gt;
&lt;P&gt;where a.name ne ' ' and dt&amp;gt;'201712';&lt;/P&gt;
&lt;P&gt;quit;&lt;/P&gt;</description>
      <pubDate>Sat, 29 Jan 2022 03:05:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/joining-strings-in-proc-sql-why-so-inefficient/m-p/793250#M254221</guid>
      <dc:creator>telligent</dc:creator>
      <dc:date>2022-01-29T03:05:17Z</dc:date>
    </item>
    <item>
      <title>Re: joining strings in proc sql why so inefficient?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/joining-strings-in-proc-sql-why-so-inefficient/m-p/793251#M254222</link>
      <description>&lt;P&gt;Why not just merge them instead?&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
  merge a(in=in1) b(in=in2);
  by name;
  if in1 and in2 and name ne ' ' and dt&amp;gt;'201712';
run;
  &lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sat, 29 Jan 2022 03:12:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/joining-strings-in-proc-sql-why-so-inefficient/m-p/793251#M254222</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2022-01-29T03:12:48Z</dc:date>
    </item>
    <item>
      <title>Re: joining strings in proc sql why so inefficient?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/joining-strings-in-proc-sql-why-so-inefficient/m-p/793271#M254231</link>
      <description>&lt;P&gt;It is a large data set, sorting it will also take a long time, but I'll give it a try.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 29 Jan 2022 08:13:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/joining-strings-in-proc-sql-why-so-inefficient/m-p/793271#M254231</guid>
      <dc:creator>telligent</dc:creator>
      <dc:date>2022-01-29T08:13:13Z</dc:date>
    </item>
    <item>
      <title>Re: joining strings in proc sql why so inefficient?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/joining-strings-in-proc-sql-why-so-inefficient/m-p/793277#M254234</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You can put the filters as a data step option (to avoid these observations from being processed).&lt;/P&gt;
&lt;P&gt;In fact you move from an output filter to an input filter that way. And an input filter is faster obviously.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;select *
from 
     lib1.abc(where=(name is not missing and dt&amp;gt;'201712')) as a
 inner join 
     lib2.xyz(where=(name is not missing and dt&amp;gt;'201712')) as b
on a.name=b.name     &lt;BR /&gt;&lt;BR /&gt;quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If name is the only variable with the same name in a and b, you can also do a natural (inner) join:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;from lib1.abc natural join lib2.xyz&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;BR /&gt;Koen&lt;/P&gt;</description>
      <pubDate>Sat, 29 Jan 2022 11:07:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/joining-strings-in-proc-sql-why-so-inefficient/m-p/793277#M254234</guid>
      <dc:creator>sbxkoenk</dc:creator>
      <dc:date>2022-01-29T11:07:19Z</dc:date>
    </item>
    <item>
      <title>Re: joining strings in proc sql why so inefficient?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/joining-strings-in-proc-sql-why-so-inefficient/m-p/793332#M254258</link>
      <description>&lt;P&gt;I'm always a bit uneasy to combine an explicit join with a where clause and then trust the SAS SQL optimizer that it doesn't first execute the join and only then applies the where condition on the intermediary result set.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In your case I'd go for an implicit join with the "date" filter first.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have_1;
  do row_id=1 to 100000;
    name=put(row_id,z10.);
    if mod(row_id,500)=0 then dt='201812';
    else dt=' ';
    output;
  end;
  stop;
run;

data have_2;
  set have_1(keep=name);
run;

proc sql _method feedback;
  create table want_1 as
  select a.*
  from 
    have_1 a
   ,have_2 b
  where 
    a.dt&amp;gt;'201712'
    and a.name ne ' ' 
    and a.name=b.name
  ;
quit;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Please note that if you've got the same name multiple times in both tables (=a many to many relationship) then you could end-up with a lot of rows which could be the reason for you long running SQL.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As for not using a data step merge because of the pre-sorting required: Your SQL will also have to sort the tables.&lt;/P&gt;
&lt;P&gt;I've added in the code above option &lt;A href="https://www.lexjansen.com/nesug/nesug12/cc/cc06.pdf" target="_self"&gt;_method&lt;/A&gt; to the SQL.&amp;nbsp;&lt;SPAN&gt;This shows you in the log what your SQL is actually doing. With my sample data it's a hash join - but this will be different with your big tables and it's highly likely that you'll see SQXSORT.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;And just as a thought: If you just want to select all rows in the one table that have a matching name in another table then creating a distinct list of names first would be the way to go (avoiding a many to many situation). ...and if this distinct list of names is not too big then you could eventually even use a SAS data step hash lookup for this which would avoid any need for sorting the data.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 30 Jan 2022 03:55:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/joining-strings-in-proc-sql-why-so-inefficient/m-p/793332#M254258</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2022-01-30T03:55:17Z</dc:date>
    </item>
  </channel>
</rss>

