<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Join vs sub query in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Join-vs-sub-query/m-p/450576#M113451</link>
    <description>&lt;P&gt;all_items have almost 64 million rows.&lt;/P&gt;</description>
    <pubDate>Tue, 03 Apr 2018 11:27:37 GMT</pubDate>
    <dc:creator>mnjtrana</dc:creator>
    <dc:date>2018-04-03T11:27:37Z</dc:date>
    <item>
      <title>Join vs sub query</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Join-vs-sub-query/m-p/450524#M113432</link>
      <description>&lt;P&gt;hi guys,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am a bit confused regarding using the join or the join statements, we don't know which is more efficient, option 1 or option 2. We are using SAS EG, 6.1 and the data is quite huge in the tables, almost 6Million records.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;We have used the below option 1 to fetch the required datsets using the sub-query but there&amp;nbsp;another approach which others as suggesting to use the option 2 to use the joins.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As per them, option 1 splits the &lt;EM&gt;&lt;U&gt;query as 'where' and 'OR' clause,&lt;/U&gt;&lt;/EM&gt;&amp;nbsp; and split of all the product ids,&lt;/P&gt;
&lt;P&gt;like '&lt;U&gt;&lt;EM&gt;where product_id = 782656 or product_id =78555 or product_id = 55421268 and so on.....'&lt;/EM&gt;&lt;/U&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;/* dataset having the required product_idd with price&amp;gt;100 */
proc sql;
create table all_items as select distinct(product_id) from catalogue where price &amp;gt;100;
quit;

/*Option1:*/
proc sql;
create table items as 
select product_id, item_id, price, description from catalogue where product_id in (select * from all_items);
quit;


/*Option2:*/
proc sql;
create table items as 
select a.product_id, a.item_id, a.price, a.description from catalogue as a
inner join all_items as b
on  a.product_id =  b.product_id;
quit;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;However as per my understanding both queries are taking same time, does sas automatically optimises the proc sql queries and gives the same result and in same cpu and time usage?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks for the help!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 03 Apr 2018 06:39:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Join-vs-sub-query/m-p/450524#M113432</guid>
      <dc:creator>mnjtrana</dc:creator>
      <dc:date>2018-04-03T06:39:56Z</dc:date>
    </item>
    <item>
      <title>Re: Join vs sub query</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Join-vs-sub-query/m-p/450541#M113435</link>
      <description>&lt;P&gt;You can use the FEEDBACK to see how SAS normalizes the query.&lt;/P&gt;
&lt;P&gt;You can use the undocumented option _method to see how SAS actually are treating the query.&lt;/P&gt;
&lt;P&gt;From optimizing perspective, indexing can in some situations help.&lt;/P&gt;
&lt;P&gt;You can set UBUFSIZE option to higher than the default 64K to make the likelihood of hash join to take place.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 03 Apr 2018 08:23:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Join-vs-sub-query/m-p/450541#M113435</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2018-04-03T08:23:59Z</dc:date>
    </item>
    <item>
      <title>Re: Join vs sub query</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Join-vs-sub-query/m-p/450575#M113450</link>
      <description>&lt;P&gt;How many observations does all_items have? This will determine if there are methods available that will outperform your SQL considerably.&lt;/P&gt;</description>
      <pubDate>Tue, 03 Apr 2018 11:21:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Join-vs-sub-query/m-p/450575#M113450</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2018-04-03T11:21:22Z</dc:date>
    </item>
    <item>
      <title>Re: Join vs sub query</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Join-vs-sub-query/m-p/450576#M113451</link>
      <description>&lt;P&gt;all_items have almost 64 million rows.&lt;/P&gt;</description>
      <pubDate>Tue, 03 Apr 2018 11:27:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Join-vs-sub-query/m-p/450576#M113451</guid>
      <dc:creator>mnjtrana</dc:creator>
      <dc:date>2018-04-03T11:27:37Z</dc:date>
    </item>
    <item>
      <title>Re: Join vs sub query</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Join-vs-sub-query/m-p/450579#M113453</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/34466"&gt;@mnjtrana&lt;/a&gt; wrote:&lt;BR /&gt;
&lt;P&gt;all_items have almost 64 million rows.&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Can't be. A subset of 6 million rows (as stated in your original post) can't have 64 million rows.&lt;/P&gt;</description>
      <pubDate>Tue, 03 Apr 2018 11:59:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Join-vs-sub-query/m-p/450579#M113453</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2018-04-03T11:59:47Z</dc:date>
    </item>
    <item>
      <title>Re: Join vs sub query</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Join-vs-sub-query/m-p/450588#M113458</link>
      <description>&lt;P&gt;JOIN&amp;nbsp;&lt;SPAN&gt;is more efficient,&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 03 Apr 2018 12:23:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Join-vs-sub-query/m-p/450588#M113458</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2018-04-03T12:23:33Z</dc:date>
    </item>
    <item>
      <title>Re: Join vs sub query</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Join-vs-sub-query/m-p/450599#M113463</link>
      <description>Opps, may bad, all_items have almost 4 million rows.</description>
      <pubDate>Tue, 03 Apr 2018 12:50:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Join-vs-sub-query/m-p/450599#M113463</guid>
      <dc:creator>mnjtrana</dc:creator>
      <dc:date>2018-04-03T12:50:21Z</dc:date>
    </item>
    <item>
      <title>Re: Join vs sub query</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Join-vs-sub-query/m-p/450613#M113469</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/34466"&gt;@mnjtrana&lt;/a&gt; wrote:&lt;BR /&gt;Opps, may bad, all_items have almost 4 million rows.&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Ok.&lt;/P&gt;
&lt;P&gt;Depending on the size of your index variable, you &lt;EM&gt;might&lt;/EM&gt; be able to fit a format (or a hash table) based on this into memory, so you could solve your task with two sequential passes (one through all_items for the format, the other for the main dataset).&lt;/P&gt;
&lt;P&gt;Alternatively, if your main dataset is already sorted along product_id you can do this:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
merge
  catalogue (
    in=a
    keep=product_id price
    rename=(price=_price)
    where=(_price &amp;gt; 100)
  )
  catalogue
;
by product_id;
if a;
drop _price;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Physically, this would be a single pass through the data, as most (if not all) would be present in cache for the seciond read.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I always like to keep my datasets sorted along the most important key.&lt;/P&gt;</description>
      <pubDate>Tue, 03 Apr 2018 13:14:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Join-vs-sub-query/m-p/450613#M113469</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2018-04-03T13:14:50Z</dc:date>
    </item>
  </channel>
</rss>

