<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: HASH JOIN in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/HASH-JOIN/m-p/632430#M187505</link>
    <description>&lt;P&gt;Thanks Patric, for your reply to my post.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 16 Mar 2020 13:51:26 GMT</pubDate>
    <dc:creator>GPatel</dc:creator>
    <dc:date>2020-03-16T13:51:26Z</dc:date>
    <item>
      <title>HASH JOIN</title>
      <link>https://communities.sas.com/t5/SAS-Programming/HASH-JOIN/m-p/631739#M187197</link>
      <description>&lt;P&gt;&lt;BR /&gt;I have three data sets:&lt;/P&gt;
&lt;P&gt;(1) S1 (n=441000 Unique records KEY (= ID) with 22 columns : ID and A1:A21)&lt;BR /&gt;(2) S2 (n=1,467,661,903 records, with duplicate Key (= ID) records with 5 columns : ID, B, C, D and Action)&lt;BR /&gt;(3) S3 (n=905 Unique records with 3 columns : X, Action and Action_Desc)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;SQL query:&lt;BR /&gt;==========&lt;/P&gt;
&lt;P&gt;Proc SQL; Select S1.*,S3.Action_Desc from &lt;BR /&gt;S1 Left join S2 on s1.id=s2.id &lt;BR /&gt;left join s2.action=s3.action;&lt;BR /&gt;quit;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;HASH CODE:&lt;BR /&gt;==========&lt;/P&gt;
&lt;P&gt;DATA HASH_ACTION;&lt;BR /&gt;IF 0 THEN SET s1;&lt;BR /&gt;if _N_ = 1 then do;&lt;BR /&gt;declare hash HASH_NAME(dataset: "s1", multidata: 'y');&lt;BR /&gt;HASH_NAME.defineKEY("ID");&lt;BR /&gt;HASH_NAME.defineData (ALL:'YES');&lt;BR /&gt;HASH_NAME.defineDone();&lt;BR /&gt;END;&lt;BR /&gt;set s2(keep=ID B C D ACTIOn);&lt;BR /&gt;IF HASH_NAME.FIND(KEY:ID) = 0 THEN OUTPUT;&lt;BR /&gt;RUN;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;Question:&lt;BR /&gt;(1) I got duplicate records HASH_ACTION dataset, how can I get only unique records.&lt;BR /&gt;(2) How can I embed : left join s2.action=s3.action into HASH Code.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 12 Mar 2020 23:28:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/HASH-JOIN/m-p/631739#M187197</guid>
      <dc:creator>GPatel</dc:creator>
      <dc:date>2020-03-12T23:28:47Z</dc:date>
    </item>
    <item>
      <title>Re: HASH JOIN</title>
      <link>https://communities.sas.com/t5/SAS-Programming/HASH-JOIN/m-p/631757#M187204</link>
      <description>&lt;P&gt;If I understand your SQL code right then you want actually nothing else than adding column Action_Desc to your S1 table.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If so:&lt;/P&gt;
&lt;P&gt;- You could load S2 into the hash (with default only loading first occurrence if duplicate keys but not throwing an error if duplicates).&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;Assumes that the number of distinct id's is not too high and that there is only one action per id&lt;/P&gt;
&lt;P&gt;- Also load S3 into a hash&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Something along the line of below (untested) code could work:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
  if _n_=1 then
    do;
      if 0 then set S3(keep=action Action_Desc);
      dcl hash hS2(dataset:'s2(keep=id action)');
      hS2.defineKey('id');
      hS2.defineData('action');
      hS2.defineDone();
      dcl hash hS3(dataset:'S3(keep=action Action_Desc)');
      hS3.defineKey('action');
      hS3.defineData('Action_Desc');
      hS3.defineDone();
    end;
  call missing (of _all_);
  set s1;
  if hS2.find()=0 then hS3.find();
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 13 Mar 2020 02:14:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/HASH-JOIN/m-p/631757#M187204</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2020-03-13T02:14:58Z</dc:date>
    </item>
    <item>
      <title>Re: HASH JOIN</title>
      <link>https://communities.sas.com/t5/SAS-Programming/HASH-JOIN/m-p/631812#M187225</link>
      <description>&lt;P&gt;In your dataset S2, do you have more than one distinct action per ID? If yes, you will need an iteration to get the same result you got in SQL.&lt;/P&gt;</description>
      <pubDate>Fri, 13 Mar 2020 10:21:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/HASH-JOIN/m-p/631812#M187225</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2020-03-13T10:21:57Z</dc:date>
    </item>
    <item>
      <title>Re: HASH JOIN</title>
      <link>https://communities.sas.com/t5/SAS-Programming/HASH-JOIN/m-p/631893#M187259</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Thanks for your response.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;In my dataset S2, I do have more than one distinct action per ID.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;I got duplicate records upon my Hash run. &lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;I learned that, I need to eliminate duplicate records, hence,&amp;nbsp; I need to group by ID and take maximum date for ACTION_DATE_IN&amp;nbsp; and ACTION_DATE_OUT.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 13 Mar 2020 14:56:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/HASH-JOIN/m-p/631893#M187259</guid>
      <dc:creator>GPatel</dc:creator>
      <dc:date>2020-03-13T14:56:20Z</dc:date>
    </item>
    <item>
      <title>Re: HASH JOIN</title>
      <link>https://communities.sas.com/t5/SAS-Programming/HASH-JOIN/m-p/631898#M187260</link>
      <description>&lt;P&gt;So you need to prepare S2 first. Sort by id and action_date_in, and then in a data step keep only the last observation per id. Once you have 1:1 relationships, you'll get no further duplicates.&lt;/P&gt;</description>
      <pubDate>Fri, 13 Mar 2020 15:01:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/HASH-JOIN/m-p/631898#M187260</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2020-03-13T15:01:55Z</dc:date>
    </item>
    <item>
      <title>Re: HASH JOIN</title>
      <link>https://communities.sas.com/t5/SAS-Programming/HASH-JOIN/m-p/632430#M187505</link>
      <description>&lt;P&gt;Thanks Patric, for your reply to my post.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 16 Mar 2020 13:51:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/HASH-JOIN/m-p/632430#M187505</guid>
      <dc:creator>GPatel</dc:creator>
      <dc:date>2020-03-16T13:51:26Z</dc:date>
    </item>
    <item>
      <title>Re: HASH JOIN</title>
      <link>https://communities.sas.com/t5/SAS-Programming/HASH-JOIN/m-p/632431#M187506</link>
      <description>&lt;P&gt;Thanks Patric and KurtBremser, for your reply to my post. I am accepting both response as accepted solutions.&lt;/P&gt;</description>
      <pubDate>Mon, 16 Mar 2020 13:52:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/HASH-JOIN/m-p/632431#M187506</guid>
      <dc:creator>GPatel</dc:creator>
      <dc:date>2020-03-16T13:52:55Z</dc:date>
    </item>
    <item>
      <title>Re: HASH JOIN</title>
      <link>https://communities.sas.com/t5/SAS-Programming/HASH-JOIN/m-p/680813#M205847</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A class="trigger-hovercard" href="https://communities.sas.com/t5/user/viewprofilepage/user-id/11562" target="_blank" rel="noopener"&gt;KurtBremser&lt;/A&gt;,&lt;/P&gt;
&lt;P&gt;I am actually encountering an error with below code for Left join with Hash objects.&lt;/P&gt;
&lt;P&gt;I have total 2 columns in Rank_Base &amp;amp; 25 columns in Trans_Prod, both datasets are unsorted.&amp;nbsp;Rank_Base&amp;nbsp; is unique, but Trans_Prod is having the duplicate records of same party_number.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data c2 ;
if _n_ = 1 then do ;
if 0 then set Rank_Base ;
dcl hash Pr_Rank (dataset: "Rank_Base",multidata: "y") ;
Pr_Rank.definekey("party_number");
Pr_Rank.definedata(all:'Y');
Pr_Rank.definedone () ;
dcl hiter iter('Pr_Rank');
end ;
do _n_=1 by 1 until(last);
set Trans_Prod end=last;
array t(100) ; 
if Pr_Rank.find()= 0 then do;
if key not in t then t(_n_)=key;
output;
end;
if last then do;
do _n_=1 to dim(t);
if Pr_Rank.check(key:t(_n_))=0 then Pr_Rank.remove(key:t(_n_));
end;
rc = iter.first();
do while (rc = 0);
call missing(risk_rating);
output;
rc = iter.next();
end;
end;
end;
drop rc t:;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Error Log:&lt;/P&gt;
&lt;PRE&gt;25 data c2 ;
26 if _n_ = 1 then do ;
27 if 0 then set Rank_Base ;
28 dcl hash Pr_Rank (dataset: "Rank_Base") ;
29 Pr_Rank.definekey("party_number");
30 Pr_Rank.definedata(all:'Y');
31 Pr_Rank.definedone () ;
32 dcl hiter iter('Pr_Rank');
33 end ;
34 do _n_=1 by 1 until(last);
35 set Trans_Prod end=last;
36 array t(100) ; /*this is arbitrary subscript number*/
37 if Pr_Rank.find()= 0 then do;
38 if key not in t then t(_n_)=key;
39 output;
40 end;
41 if last then do;
42 do _n_=1 to dim(t);
43 if Pr_Rank.check(key:t(_n_))=0 then Pr_Rank.remove(key:t(_n_));
44 end;
45 rc = iter.first();
46 do while (rc = 0);
47 call missing(risk_rating);
48 output;
49 rc = iter.next();
50 end;
51 end;
52 end;
53 drop rc t:;
54 run;

NOTE: Variable key is uninitialized.
2 The SAS System 10:56 Tuesday, September 1, 2020

NOTE: There were 3990011 observations read from the data set RANK_BASE.
ERROR: Type mismatch for key variable party_number at line 43 column 5.
ERROR: Keys and data must be specified in the same order and have the same types as given in DefineKey/DefineData at line 43 column 
5.
ERROR: DATA STEP Component Object failure. Aborted during the EXECUTION phase.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 145185092 observations read from the data set TRANS_PROD.
WARNING: The data set C2 may be incomplete. When this step was stopped there were 105030825 observations and 26 variables.
WARNING: Data set C2 was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 5:11.08
cpu time 3:34.62&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The SQL equivalent query for the above:&lt;/P&gt;
&lt;P&gt;select distinct a.*, b.risk_rating&lt;BR /&gt;from Trans_Prod a left join Rank_Base b&lt;BR /&gt;on strip(a.party_number)=strip(b.party_number)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am just trying to map the Risk_rating column, from Rank_Base table with hash objects, but I want the distinct result get fetched in the output with all columns from left table and risk_rating column from right table. Also I need 145185092 observations in the output, but the hashing is deleting some records I believe.&lt;/P&gt;
&lt;P&gt;I don't want to sort as the Trans_Prod is having 140 Million data &amp;amp; Risk_Base is 39 Million data.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Please help.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks&lt;/P&gt;
&lt;P&gt;Rajdeep&lt;/P&gt;</description>
      <pubDate>Wed, 02 Sep 2020 00:27:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/HASH-JOIN/m-p/680813#M205847</guid>
      <dc:creator>rajdeep</dc:creator>
      <dc:date>2020-09-02T00:27:12Z</dc:date>
    </item>
    <item>
      <title>Re: HASH JOIN</title>
      <link>https://communities.sas.com/t5/SAS-Programming/HASH-JOIN/m-p/680817#M205849</link>
      <description>&lt;P&gt;Please post your question in a new thread, do not hijack other's.&lt;/P&gt;
&lt;P&gt;Post the&amp;nbsp;&lt;EM&gt;whole&lt;/EM&gt; log of the step, with line numbers.&lt;/P&gt;
&lt;P&gt;Supply example data in data steps with datalines.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Looks like your variables are of different type in the datasets.&lt;/P&gt;</description>
      <pubDate>Tue, 01 Sep 2020 20:48:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/HASH-JOIN/m-p/680817#M205849</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2020-09-01T20:48:19Z</dc:date>
    </item>
    <item>
      <title>Re: HASH JOIN</title>
      <link>https://communities.sas.com/t5/SAS-Programming/HASH-JOIN/m-p/680841#M205859</link>
      <description>I think I was missing the call missing statement in my code.&lt;BR /&gt;&lt;BR /&gt;data want;&lt;BR /&gt;set Trans_Prod;&lt;BR /&gt;if _n_ = 1 then do ;&lt;BR /&gt;if 0 then set Rank_Base ;&lt;BR /&gt;dcl hash Pr_Rank (dataset: "Rank_Base") ;&lt;BR /&gt;Pr_Rank.definekey("party_number");&lt;BR /&gt;Pr_Rank.definedata(all:'Y');&lt;BR /&gt;Pr_Rank.definedone () ;&lt;BR /&gt;end;&lt;BR /&gt;if Pr_Rank.find() ne 0 then call missing(RISK_RATING);&lt;BR /&gt;run;&lt;BR /&gt;&lt;BR /&gt;This code is working as per the expectation like left join.</description>
      <pubDate>Wed, 02 Sep 2020 00:24:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/HASH-JOIN/m-p/680841#M205859</guid>
      <dc:creator>rajdeep</dc:creator>
      <dc:date>2020-09-02T00:24:26Z</dc:date>
    </item>
  </channel>
</rss>

