<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Left join big tables by hash objects in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Left-join-big-tables-by-hash-objects/m-p/831639#M328658</link>
    <description>Thank you for your reply. These tables aren't sorted and indexed. Proc sql lasts three times longer than hash tables.</description>
    <pubDate>Sat, 03 Sep 2022 04:07:44 GMT</pubDate>
    <dc:creator>kinder_skipper</dc:creator>
    <dc:date>2022-09-03T04:07:44Z</dc:date>
    <item>
      <title>Left join big tables by hash objects</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Left-join-big-tables-by-hash-objects/m-p/831537#M328617</link>
      <description>&lt;P&gt;I have three tables which I want to join:&amp;nbsp;&lt;/P&gt;&lt;P&gt;tab1 - about 55 millions rows (size 25 GB)&lt;/P&gt;&lt;P&gt;tab2 - 2,5 millions rows&amp;nbsp;&lt;/P&gt;&lt;P&gt;tab3 - 1,5 millions rows&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;proc sql looks like:&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=""&gt;proc sql;
create table tab4 as
	select
		t1.*
		,t2.col_2
		,coalesce(t1.col_3, t3.col_3) as col_3_new
		,coalesce(t1.col_4, t3.col_4) as col_4_new
	from tab1 t1
		left join tab2 t2 on t1.id = t2.id
		left join tab3 t3
					on t1.id = t3.id
					and t1.id_1 = t3.id_1
					and t1.id_2 = t3.id_2
;
quit;

data tab4;
set tab4;
drop col_3 col_4;
rename col_3_new = col_3 col_4_new = col_4;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I tried to use hash tables instead of proc sql, but I don't know if I do right and how to include coalesce():&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=""&gt;data tab4;
  if _n_=1 then
    do;
      /* map tables */
      if 0 then set tab1 tab2 tab3;
      /* define and load hashes */
      dcl hash h_b(dataset:'tab2');
      h_b.defineKey('id');
      h_b.defineData('col_2');
      h_b.defineDone();
      dcl hash h_c(dataset:'tab3');
      h_c.defineKey('id', 'id_1', 'id_2');
      h_c.defineData('col_3', 'col_4');
      h_c.defineDone();
    end;
  call missing(of _all_);
  set tab1;
  _rc=h_b.find(key: id);
  _rc=h_c.find(key: id, key:id_1, key:id_2);
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Maybe is it better method to join these tables?&lt;/P&gt;</description>
      <pubDate>Fri, 02 Sep 2022 12:29:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Left-join-big-tables-by-hash-objects/m-p/831537#M328617</guid>
      <dc:creator>kinder_skipper</dc:creator>
      <dc:date>2022-09-02T12:29:15Z</dc:date>
    </item>
    <item>
      <title>Re: Left join big tables by hash objects</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Left-join-big-tables-by-hash-objects/m-p/831587#M328628</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/432881"&gt;@kinder_skipper&lt;/a&gt;&amp;nbsp;and welcome to the SAS Support Communities!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Basically, your code looks good, &lt;EM&gt;provided that&lt;/EM&gt; ID is a &lt;EM&gt;unique&lt;/EM&gt; key in TAB2 and that the combination of ID, ID_1 and ID_2 is unique in TAB3. The DATA step using the hash objects has the advantage of preserving the order of observations from TAB1. This is not guaranteed by PROC SQL (in the absence of an ORDER BY clause), in particular if ID were not unique in TAB1.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The COALESCE part can be added as shown in the highlighted changes below:&lt;/P&gt;
&lt;PRE&gt;data tab4&lt;STRONG&gt;&lt;FONT color="#3366FF"&gt;(drop=_:)&lt;/FONT&gt;&lt;/STRONG&gt;;
  if _n_=1 then
    do;
      /* map tables */
      if 0 then set tab1 tab2 tab3&lt;FONT color="#3366FF"&gt;&lt;STRONG&gt;(rename=(col_3=_col_3 col_4=_col_4))&lt;/STRONG&gt;&lt;/FONT&gt;;
      /* define and load hashes */
      dcl hash h_b(dataset:'tab2');
      h_b.defineKey('id');
      h_b.defineData('col_2');
      h_b.defineDone();
      dcl hash h_c(dataset:'tab3&lt;STRONG&gt;&lt;FONT color="#3366FF"&gt;(rename=(col_3=_col_3 col_4=_col_4))&lt;/FONT&gt;&lt;/STRONG&gt;');
      h_c.defineKey('id', 'id_1', 'id_2');
      h_c.defineData('&lt;FONT color="#3366FF"&gt;&lt;STRONG&gt;_col_3&lt;/STRONG&gt;&lt;/FONT&gt;', '&lt;FONT color="#3366FF"&gt;&lt;STRONG&gt;_col_4&lt;/STRONG&gt;&lt;/FONT&gt;');
      h_c.defineDone();
    end;
  call missing(of _all_);
  set tab1;
  _rc=h_b.find(key: id);
  _rc=h_c.find(key: id, key:id_1, key:id_2);
&lt;FONT color="#3366FF"&gt;&lt;STRONG&gt;  col_3=coalesce(col_3,_col_3); /* or coalesceC in case   */
  col_4=coalesce(col_4,_col_4); /* of character variables */&lt;/STRONG&gt;&lt;/FONT&gt;
run;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The "&lt;FONT face="courier new,courier"&gt;key:&lt;/FONT&gt;" argument tags are redundant:&lt;/P&gt;
&lt;PRE&gt;  _rc=h_b.find();
  _rc=h_c.find();&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also note that you could omit the entire DATA step following the PROC SQL step by using dataset options as follows:&lt;/P&gt;
&lt;PRE&gt;create table tab4&lt;STRONG&gt;&lt;FONT color="#3366FF"&gt;(drop=col_3 col_4 rename=(col_3_new = col_3 col_4_new = col_4))&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 02 Sep 2022 16:13:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Left-join-big-tables-by-hash-objects/m-p/831587#M328628</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2022-09-02T16:13:58Z</dc:date>
    </item>
    <item>
      <title>Re: Left join big tables by hash objects</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Left-join-big-tables-by-hash-objects/m-p/831615#M328640</link>
      <description>&lt;P&gt;It works. Thanku you very much!&lt;/P&gt;</description>
      <pubDate>Fri, 02 Sep 2022 19:42:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Left-join-big-tables-by-hash-objects/m-p/831615#M328640</guid>
      <dc:creator>kinder_skipper</dc:creator>
      <dc:date>2022-09-02T19:42:29Z</dc:date>
    </item>
    <item>
      <title>Re: Left join big tables by hash objects</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Left-join-big-tables-by-hash-objects/m-p/831635#M328656</link>
      <description>&lt;P&gt;Since you have to copy the entire LARGE dataset anyway then&amp;nbsp;I suspect the SQL is going to be just as efficient (if not more) than anything you write.&amp;nbsp; Especially if the dataset are sorted (or indexed).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If the datasets are sorted you just use normal SAS data step(s) to merge them. (unless the goal is to blow-up the large dataset into something even larger).&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data step1 / view=step1;
   merge t1(in=in1) t2(keep=id col2);
   by id1;
   if in1;
run;

data want;
   merge step1(in=in1) t3(keep=id1 id2 id3 col_3 col_4) ;
   by id1 id2 id3 ;
   if in1;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Which will use the values of COL_3 and COL_4 from T3 when there is a match.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Or perhaps the coalesce() function was intended to modify the value of COL_3 and COL_4 from T1 only when the value in T3 was non-missing? (That is not replace an existing value with a missing value.)&amp;nbsp; In which case you should probably use an UPDATE instead of MERGE in the last step.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
   update step1(in=in1) t3(keep=id1 id2 id3 col_3 col_4);
   by id1 id2 id3 ;
   if in1;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 03 Sep 2022 01:53:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Left-join-big-tables-by-hash-objects/m-p/831635#M328656</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2022-09-03T01:53:52Z</dc:date>
    </item>
    <item>
      <title>Re: Left join big tables by hash objects</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Left-join-big-tables-by-hash-objects/m-p/831639#M328658</link>
      <description>Thank you for your reply. These tables aren't sorted and indexed. Proc sql lasts three times longer than hash tables.</description>
      <pubDate>Sat, 03 Sep 2022 04:07:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Left-join-big-tables-by-hash-objects/m-p/831639#M328658</guid>
      <dc:creator>kinder_skipper</dc:creator>
      <dc:date>2022-09-03T04:07:44Z</dc:date>
    </item>
  </channel>
</rss>

