<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to optimise two SQL inner joins and a union? in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/How-to-optimise-two-SQL-inner-joins-and-a-union/m-p/900940#M356058</link>
    <description>&lt;P&gt;You can use DATA step with hash objects to avoid sorts.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If A, B, and C each have only one obs per record value (and B and C have no common RECORD values), then you can build a hash object (i.e. a "lookup table") keyed on RECORD, and containing TYPE, from dataset A.&amp;nbsp; Then read B and C and keep if they match any RECORD key in the hash object:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
  set b c;
  if _n_=1 then do;
    if 0 then set a;
    declare hash ha (dataset:'a');
      ha.definekey('record');
      ha.definedata(all:'Y');
      ha.definedone();
  end;
  if ha.find()=0;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 01 Nov 2023 02:00:27 GMT</pubDate>
    <dc:creator>mkeintz</dc:creator>
    <dc:date>2023-11-01T02:00:27Z</dc:date>
    <item>
      <title>How to optimise two SQL inner joins and a union?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-optimise-two-SQL-inner-joins-and-a-union/m-p/900763#M355983</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I have 3 datasets (with millions of observations). The masterfile contains a variable "record" that links it with the other datasets. The other 2 datasets contain the same variables.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I can get my rough code to work, but ideally the merge would occur in one SQL step, if possible. Even in the individual SQL steps I receive a warning "WARNING: Variable record already exists on file WORK.WANT1."&amp;nbsp; &amp;amp; "WARNING: Variable record already exists on file WORK.WANT2."&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I would really appreciate help optimising this code as I am new to SQL.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=""&gt;data a;
input record type;
datalines;
1 5
2 8
3 2
4 1
5 100
6 21
;
run;

data b;
input record treat;
datalines;
1 8
5 9
7 12
8 200
;
run;

data c;
input record treat;
datalines;
2 10
3 8
9 11
10 200
;
run;

data want;
input record type treat;
datalines;
1 5 8
2 8 10
3 2 8
5 100 9
;
run;&lt;BR /&gt;
/*This rough code generates the dataset that I want*/
proc sql;
create table want1 as
select * from a as A
inner join b
on a.record=b.record
;
quit;

proc sql;
create table want2 as
select * from a as A
inner join c
on a.record=c.record
;
quit;

proc sql;
    create table want_test as
    select * from want1
    union
    select * from want2;
quit;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 30 Oct 2023 18:07:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-optimise-two-SQL-inner-joins-and-a-union/m-p/900763#M355983</guid>
      <dc:creator>al15</dc:creator>
      <dc:date>2023-10-30T18:07:05Z</dc:date>
    </item>
    <item>
      <title>Re: How to optimise two SQL inner joins and a union?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-optimise-two-SQL-inner-joins-and-a-union/m-p/900772#M355987</link>
      <description>&lt;P&gt;&lt;STRONG&gt;DO NOT USE THE ASTERISK!&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;When doing joins, always use an explicit list of variables, and use table aliases to make sure from which dataset a variable is taken.&lt;/P&gt;</description>
      <pubDate>Mon, 30 Oct 2023 18:29:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-optimise-two-SQL-inner-joins-and-a-union/m-p/900772#M355987</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2023-10-30T18:29:31Z</dc:date>
    </item>
    <item>
      <title>Re: How to optimise two SQL inner joins and a union?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-optimise-two-SQL-inner-joins-and-a-union/m-p/900838#M356012</link>
      <description>&lt;P&gt;If each dataset is sorted by RECORD, and each dataset has no more than 1 record per RECORD, then you have a much simpler solution available in the DATA step:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
  merge a (in=ina)  b (in=inb)  c (in=inc);
  by record;
  if ina=1  and (inb=1 or inc=1);
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 30 Oct 2023 23:36:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-optimise-two-SQL-inner-joins-and-a-union/m-p/900838#M356012</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2023-10-30T23:36:28Z</dc:date>
    </item>
    <item>
      <title>Re: How to optimise two SQL inner joins and a union?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-optimise-two-SQL-inner-joins-and-a-union/m-p/900866#M356020</link>
      <description>&lt;P&gt;Thank you for your reply. My datasets weren't sorted by record, so I was hoping sql would work.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 31 Oct 2023 08:26:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-optimise-two-SQL-inner-joins-and-a-union/m-p/900866#M356020</guid>
      <dc:creator>al15</dc:creator>
      <dc:date>2023-10-31T08:26:31Z</dc:date>
    </item>
    <item>
      <title>Re: How to optimise two SQL inner joins and a union?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-optimise-two-SQL-inner-joins-and-a-union/m-p/900923#M356049</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/446112"&gt;@al15&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Thank you for your reply. My datasets weren't sorted by record, so I was hoping sql would work.&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;SQL still has to order the data for any join on criteria. Using Proc SQL when the data isn't sorted just means that SQL runs slower.&lt;/P&gt;
&lt;P&gt;You might want to investigate use of INDEXes with your variables used for matching if used frequently as the indexes may speed up things in SQL. But the step adding/updating the index might take some time to run.&lt;/P&gt;</description>
      <pubDate>Tue, 31 Oct 2023 14:56:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-optimise-two-SQL-inner-joins-and-a-union/m-p/900923#M356049</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2023-10-31T14:56:25Z</dc:date>
    </item>
    <item>
      <title>Re: How to optimise two SQL inner joins and a union?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-optimise-two-SQL-inner-joins-and-a-union/m-p/900940#M356058</link>
      <description>&lt;P&gt;You can use DATA step with hash objects to avoid sorts.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If A, B, and C each have only one obs per record value (and B and C have no common RECORD values), then you can build a hash object (i.e. a "lookup table") keyed on RECORD, and containing TYPE, from dataset A.&amp;nbsp; Then read B and C and keep if they match any RECORD key in the hash object:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
  set b c;
  if _n_=1 then do;
    if 0 then set a;
    declare hash ha (dataset:'a');
      ha.definekey('record');
      ha.definedata(all:'Y');
      ha.definedone();
  end;
  if ha.find()=0;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 01 Nov 2023 02:00:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-optimise-two-SQL-inner-joins-and-a-union/m-p/900940#M356058</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2023-11-01T02:00:27Z</dc:date>
    </item>
    <item>
      <title>Re: How to optimise two SQL inner joins and a union?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-optimise-two-SQL-inner-joins-and-a-union/m-p/900991#M356084</link>
      <description>&lt;P&gt;A data step with hash table lookups would likely perform best as long as you've got enough memory for the hash tables.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Below code that combines everything into a single SQL. This will still require implicit sorting for the joins. The main difference to your code is the union ALL and combining the data prior to the inner join - which avoids multiple sorts of table a.&lt;/P&gt;
&lt;P&gt;Without the ALL keyword the union will dedup the result set which requires sorting. If you can be sure that you don't have overlaps in your tables b and c then such a dedup is not required.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
  create table want as
  select 
    l.record,
    l.type,
    r.treat
  from
    a l
    inner join
    (
      select record, treat
      from b
      union all
      select record, treat
      from c
    ) r
    on l.record=r.record
  ;
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 31 Oct 2023 23:40:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-optimise-two-SQL-inner-joins-and-a-union/m-p/900991#M356084</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2023-10-31T23:40:54Z</dc:date>
    </item>
  </channel>
</rss>

