<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to reduce space utilisation during joins in proc sql? in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-space-utilisation-during-joins-in-proc-sql/m-p/807185#M318176</link>
    <description>&lt;P&gt;The best way to optimize a SQL query is to not use SQL. Unless you need to do a many-to-many join, a data step MERGE will outperform SQL and consume considerably less space during the necessary sorts. The MERGE itself needs only space for the resulting dataset.&lt;/P&gt;
&lt;P&gt;Depending on the number (and type and size) of variables taken from the 1G dataset, you might even be able to do it with no preceding sort and a hash object.&lt;/P&gt;
&lt;P&gt;If you show us your SQL code, we can suggest how to translate it to data step code.&lt;/P&gt;</description>
    <pubDate>Mon, 11 Apr 2022 16:02:26 GMT</pubDate>
    <dc:creator>Kurt_Bremser</dc:creator>
    <dc:date>2022-04-11T16:02:26Z</dc:date>
    <item>
      <title>How to reduce space utilisation during joins in proc sql?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-space-utilisation-during-joins-in-proc-sql/m-p/807181#M318173</link>
      <description>&lt;P&gt;I was joining 2 tables, 5 GB and 1GB in size respectively (5g left join 1g). (Both had around 1 million rows)&lt;BR /&gt;&lt;BR /&gt;Took around 13 minutes, which moderate compared to queries I generally run.&lt;BR /&gt;&lt;BR /&gt;During the join a file called "sastmp-000000111.sas7butl" was created which was 127 GB in size.&lt;BR /&gt;&lt;BR /&gt;My access was blocked and now I have to raise a ticket to get it unlocked, with an optimised query to perform the same task.&lt;BR /&gt;&lt;BR /&gt;But I have no clear idea on what would be an optimal query.&lt;BR /&gt;&lt;BR /&gt;I've heard about indexes, but not sure if they would be useful, or how to use them here.&lt;/P&gt;</description>
      <pubDate>Mon, 11 Apr 2022 15:20:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-space-utilisation-during-joins-in-proc-sql/m-p/807181#M318173</guid>
      <dc:creator>Rohit_Rai_1996</dc:creator>
      <dc:date>2022-04-11T15:20:16Z</dc:date>
    </item>
    <item>
      <title>Re: How to reduce space utilisation during joins in proc sql?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-space-utilisation-during-joins-in-proc-sql/m-p/807182#M318174</link>
      <description>&lt;P&gt;You would have to show the code for suggestions that apply to your job.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Generic advice: reduce the size as early as possible. One example is to only select variables needed early on.&lt;/P&gt;
&lt;P&gt;If data xxx has 400 variables and you only need 4 instead of using&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;from xxx&lt;/PRE&gt;
&lt;P&gt;use:&lt;/P&gt;
&lt;PRE&gt;from (select var1, var2, var3, var4 from xxx) as x&lt;/PRE&gt;
&lt;P&gt;If you know that only need some of the records apply where clauses before joining for example&lt;/P&gt;
&lt;PRE&gt;from (select var1, var2, var3, var4 from xxx
         where varxxx in (&amp;lt;list of values&amp;gt;) and varyyy ge '01JAN2022'd ) as x&lt;/PRE&gt;</description>
      <pubDate>Mon, 11 Apr 2022 15:26:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-space-utilisation-during-joins-in-proc-sql/m-p/807182#M318174</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2022-04-11T15:26:29Z</dc:date>
    </item>
    <item>
      <title>Re: How to reduce space utilisation during joins in proc sql?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-space-utilisation-during-joins-in-proc-sql/m-p/807184#M318175</link>
      <description>I understand.&lt;BR /&gt;&lt;BR /&gt;In this case all variables are needed (45 on one side, 20 on the other, total 67).</description>
      <pubDate>Mon, 11 Apr 2022 16:01:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-space-utilisation-during-joins-in-proc-sql/m-p/807184#M318175</guid>
      <dc:creator>Rohit_Rai_1996</dc:creator>
      <dc:date>2022-04-11T16:01:53Z</dc:date>
    </item>
    <item>
      <title>Re: How to reduce space utilisation during joins in proc sql?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-space-utilisation-during-joins-in-proc-sql/m-p/807185#M318176</link>
      <description>&lt;P&gt;The best way to optimize a SQL query is to not use SQL. Unless you need to do a many-to-many join, a data step MERGE will outperform SQL and consume considerably less space during the necessary sorts. The MERGE itself needs only space for the resulting dataset.&lt;/P&gt;
&lt;P&gt;Depending on the number (and type and size) of variables taken from the 1G dataset, you might even be able to do it with no preceding sort and a hash object.&lt;/P&gt;
&lt;P&gt;If you show us your SQL code, we can suggest how to translate it to data step code.&lt;/P&gt;</description>
      <pubDate>Mon, 11 Apr 2022 16:02:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-space-utilisation-during-joins-in-proc-sql/m-p/807185#M318176</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2022-04-11T16:02:26Z</dc:date>
    </item>
    <item>
      <title>Re: How to reduce space utilisation during joins in proc sql?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-space-utilisation-during-joins-in-proc-sql/m-p/807277#M318249</link>
      <description>Thanks, I didn't realise merge was better.&lt;BR /&gt;&lt;BR /&gt;Code was :&lt;BR /&gt;proc sql;&lt;BR /&gt;create table AB as&lt;BR /&gt;select*&lt;BR /&gt;from Table1 a&lt;BR /&gt;left join Table2 b&lt;BR /&gt;on a.key=b.key;&lt;BR /&gt;quit;&lt;BR /&gt;</description>
      <pubDate>Tue, 12 Apr 2022 05:03:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-space-utilisation-during-joins-in-proc-sql/m-p/807277#M318249</guid>
      <dc:creator>Rohit_Rai_1996</dc:creator>
      <dc:date>2022-04-12T05:03:17Z</dc:date>
    </item>
    <item>
      <title>Re: How to reduce space utilisation during joins in proc sql?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-space-utilisation-during-joins-in-proc-sql/m-p/807284#M318252</link>
      <description>&lt;P&gt;SELECT * with at least one variable in common between the two datasets will result in a WARNING, so it would be sloppy programming at best.&lt;/P&gt;
&lt;P&gt;Please post your REAL code, so we can see which variables from which dataset will contribute to the final result. If sensitive variable names are in there, replace them with consistent XXX, YYY and so on. But it is imperative to know from where the variables come from.&lt;/P&gt;</description>
      <pubDate>Tue, 12 Apr 2022 06:29:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-space-utilisation-during-joins-in-proc-sql/m-p/807284#M318252</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2022-04-12T06:29:18Z</dc:date>
    </item>
    <item>
      <title>Re: How to reduce space utilisation during joins in proc sql?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-space-utilisation-during-joins-in-proc-sql/m-p/807289#M318255</link>
      <description>I actually used SELECT* in the real code.&lt;BR /&gt;&lt;BR /&gt;There are 45 variables in the left table, 20 variables in the right table. We need all the variables in the final table&lt;BR /&gt;&lt;BR /&gt;2 or 3 are common. AFAIK, if the variable is already present in left, then it isn't picked up from right.</description>
      <pubDate>Tue, 12 Apr 2022 06:50:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-space-utilisation-during-joins-in-proc-sql/m-p/807289#M318255</guid>
      <dc:creator>Rohit_Rai_1996</dc:creator>
      <dc:date>2022-04-12T06:50:51Z</dc:date>
    </item>
    <item>
      <title>Re: How to reduce space utilisation during joins in proc sql?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-space-utilisation-during-joins-in-proc-sql/m-p/807294#M318259</link>
      <description>&lt;P&gt;You &lt;STRONG&gt;DO NOT WANT&lt;/STRONG&gt; WARNINGs in your LOG, ever. &lt;STRONG&gt;PERIOD.&lt;/STRONG&gt; Good code runs without ERRORs, WARNINGs or any NOTEs beyond those telling you how long a step took and which datasets it read and/or created. So you do not use a global asterisk in a SQL join. You may use a.* to select all variables from one dataset, but you must be selective for the variables coming from other datasets to avoid name collisions. In case of a FULL/OUTER join, you must take care to COALESCE the ON variable(s), or you'll get unwanted missing values.&lt;/P&gt;
&lt;P&gt;Any code that throws a WARNING (or an ERROR) will not run successfully from a scheduler, so it's unusable in a professional context.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you have a one-to-one, a one-to-many, or a many-to-one relationship, these SORT/DATA steps can replace your SQL:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=table1;
by key;
run;

proc sort data=table2;
by key;
run;

data ab;
merge
  table1 (in=a)
  table2 (drop=....) /* drop the variables that are common, except key */
;
by key;
if a; /* keep only observations coming from table1 */
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Depending on the state of your datasets, the SORT steps may not be necessary.&lt;/P&gt;</description>
      <pubDate>Tue, 12 Apr 2022 07:14:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-space-utilisation-during-joins-in-proc-sql/m-p/807294#M318259</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2022-04-12T07:14:53Z</dc:date>
    </item>
    <item>
      <title>Re: How to reduce space utilisation during joins in proc sql?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-space-utilisation-during-joins-in-proc-sql/m-p/807321#M318271</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/422584"&gt;@Rohit_Rai_1996&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;I actually used SELECT* in the real code.&lt;BR /&gt;2 or 3 are common. &lt;STRONG&gt;AFAIK, if the variable is already present in left, then it isn't picked up from right.&lt;/STRONG&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN&gt;...or may-be not and you could end-up with an undesired result.&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data table1;
  set sashelp.class(drop=age rename=(name=key));
run;

data table2;
  set sashelp.class(rename=(name=key));
  if mod(_n_,2)=0;
run;

proc sql feedback;
  create table AB as
    select*
  from Table1 a
    left join Table2 b
      on a.key=b.key;
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN&gt;Even though Sex, Age, Height and Weight also exist in Table1 FEEDBACK tells us they get picked up from TABLE2.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Patrick_0-1649754558201.png" style="width: 784px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/70305i0324B48A1D6D3336/image-dimensions/784x147?v=v2" width="784" height="147" role="button" title="Patrick_0-1649754558201.png" alt="Patrick_0-1649754558201.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But then looking at the result table it seems only AGE gets really picked-up from TABLE2 - at least with my version of SAS.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Patrick_1-1649754747139.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/70306i1EBA093A1B97DF05/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Patrick_1-1649754747139.png" alt="Patrick_1-1649754747139.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;That FEEDBACK doesn't tell us the "truth" is ugly and looks like an undocumented feature. If I would have a say then SAS wouldn't just throw a WARNING but an ERROR for such cases. That's what all the databases I know would be doing.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 12 Apr 2022 09:14:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-space-utilisation-during-joins-in-proc-sql/m-p/807321#M318271</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2022-04-12T09:14:50Z</dc:date>
    </item>
    <item>
      <title>Re: How to reduce space utilisation during joins in proc sql?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-space-utilisation-during-joins-in-proc-sql/m-p/807345#M318296</link>
      <description>Thanks. I'll try this</description>
      <pubDate>Tue, 12 Apr 2022 11:02:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-space-utilisation-during-joins-in-proc-sql/m-p/807345#M318296</guid>
      <dc:creator>Rohit_Rai_1996</dc:creator>
      <dc:date>2022-04-12T11:02:50Z</dc:date>
    </item>
    <item>
      <title>Re: How to reduce space utilisation during joins in proc sql?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-space-utilisation-during-joins-in-proc-sql/m-p/807347#M318298</link>
      <description>Oh&lt;BR /&gt;&lt;BR /&gt;I'll try to pay attention to this</description>
      <pubDate>Tue, 12 Apr 2022 11:15:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-space-utilisation-during-joins-in-proc-sql/m-p/807347#M318298</guid>
      <dc:creator>Rohit_Rai_1996</dc:creator>
      <dc:date>2022-04-12T11:15:47Z</dc:date>
    </item>
  </channel>
</rss>

