<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Hash In Place in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Hash-In-Place/m-p/673913#M202828</link>
    <description>&lt;P&gt;If your lookup table is large, it is probably a lot more efficient to create an index and use that to find the records:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
  create unique index item on have_lookup_table(item);
quit;

data want notfound;
  set have_base_table;
  set have_lookup_table key=item/unique;
  if _iorc_=0 then output want;
  else output notfound;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;- at least if you are using this kind of lookup often.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;A couple of notes:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;I created a unique index on the lookup table, meaning that duplicate values of the ITEM variable are not allowed&lt;/LI&gt;
&lt;LI&gt;I used the UNIQUE option for the SET with KEY=, so that you can find values for an item also when there are repeats in the base table&lt;/LI&gt;
&lt;LI&gt;I put in a bit of code to account for the situations where the item on the base table is not in the lookup table, using the _IORC_ automatic variable - these will be output to the table NOTFOUND in the example.&lt;/LI&gt;
&lt;/UL&gt;</description>
    <pubDate>Sat, 01 Aug 2020 14:56:37 GMT</pubDate>
    <dc:creator>s_lassen</dc:creator>
    <dc:date>2020-08-01T14:56:37Z</dc:date>
    <item>
      <title>Hash In Place</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-In-Place/m-p/673802#M202764</link>
      <description>&lt;P&gt;I am trying to do the following:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1) Get a list of keys from the base data&lt;/P&gt;&lt;P&gt;2) Get the info from the lookup table for these keys&lt;/P&gt;&lt;P&gt;3) Merge the hash table w/info onto the base table&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;To do this, I am trying to implement a hash in place (I think this is what it is called). The problem I am running into is that this doesn't work quickly if the lookup table is very large. In this example, imagine if "have_lookup_table" was 5M records long. I'm setting that entire table just to get the lookup info for 4 records, even if those four records are the first four records of the 5M record dataset. Is there a way to get the desired result&amp;nbsp;&lt;EM&gt;without&lt;/EM&gt; having to set the entire table?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;BR /&gt;&lt;CODE class=" language-sas"&gt;data have_base_table;
   infile datalines delimiter=','; 
   input Item:$8. Trans_Date:yymmdd10. Code :$4.;
   format Trans_Date yymmdd10.;
   datalines;                      
Apple, 2020-07-29, 4036
Orange, 2020-07-29, 4031
Carrot, 2020-07-21, 4110
Banana, 2020-07-31, 5191
;

data have_lookup_table;
   infile datalines delimiter=','; 
   input Item $ Price 8.;
   datalines;                      
Apple, 3.00
Orange, 1.00
Carrot, 0.50
Banana, 0.75
;

data want;
   infile datalines delimiter=','; 
   input Item:$8. Trans_Date:yymmdd10. Code :$4. Price:8.;
   format Trans_Date yymmdd10.;
   datalines;                      
Apple, 2020-07-29, 4036, 3.00
Orange, 2020-07-29, 4031, 1.00
Carrot, 2020-07-21, 4110, 0.50
Banana, 2020-07-31, 5191, 0.75
;
run;

data test;

if 0 then set have_lookup_table;
if _n_=1 then do;

*Define empty hash table to be filled with only the necessary lookups, depending on what is in the base table;
dcl hash h();
h.definekey("Item");
h.definedata("Item","Price");
h.definedone();

end;

*Fill empty hash table with necessary keys from base table;
do until (eof);
set have_base_table end=eof;
rc_ref=h.ref();
end;
&lt;BR /&gt;h.output(dataset:"hash_keys");&lt;BR /&gt;&lt;BR /&gt;*Fill hash table with corresponding key-matched data;
do until (last2);
set have_lookup_table end=last2;
rc_replace=h.replace();
end;

h.output(dataset:"hash_keys_and_data");

call missing(Price);
&lt;BR /&gt;*Merge finished hash table onto base data;
do until (last);
set have_base_table end=last;
rc_h=h.find();
output;
end;

run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 31 Jul 2020 18:18:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-In-Place/m-p/673802#M202764</guid>
      <dc:creator>theponcer</dc:creator>
      <dc:date>2020-07-31T18:18:43Z</dc:date>
    </item>
    <item>
      <title>Re: Hash In Place</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-In-Place/m-p/673862#M202794</link>
      <description>&lt;P&gt;You could do this:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql noprint;
  select distinct cats("'",id,"'")
    into :id_list separated by ',' from have_base_table;
quit;
%put &amp;amp;=id_list;

data want (drop=_:);
 set have_base_table;
 if _n_=1 then do;
   if 0 then set have_lookup_table;
   declare hash h (dataset:"have_lookup_table (where=(id in (&amp;amp;id_list.)))");
     h.definekey('id');
	 h.definedata(all:'Y');
	 h.definedone();
  end;
  _rc=h.find();
  output;
  call missing(of _all_);
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;The PROC SQL creates a macrovar ID_LIST the looks like&amp;nbsp; &lt;EM&gt;&lt;STRONG&gt;'Apple','Banana','Carrot','Orange'&lt;/STRONG&gt;&lt;/EM&gt;.&amp;nbsp; Later this macro var is used an a "where=" data set name parameter to populate the hash table H with only the needed records from have_lookup_table.&amp;nbsp; A much smaller hash object.&lt;/LI&gt;
&lt;LI&gt;The explicit "output;" followed by a "call missing ..." statement is needed only if a lookup could fail (i.e. if an id is in have_base_table, but not in "have_lookup_table").&amp;nbsp; Such failures would leave the relevant variables with values from the most recent successful lookup.&amp;nbsp; So after each OUTPUT, the call missing prevents possible contamination of a failed lookup in the next record.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 01 Aug 2020 02:25:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-In-Place/m-p/673862#M202794</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2020-08-01T02:25:14Z</dc:date>
    </item>
    <item>
      <title>Re: Hash In Place</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-In-Place/m-p/673868#M202797</link>
      <description>&lt;P&gt;Below&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31461"&gt;@mkeintz&lt;/a&gt;&amp;nbsp;solution with a few minor tweaks to the code.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql noprint;
  select distinct cats("'",id,"'")
    into :id_list separated by ',' from have_base_table;
quit;

%put &amp;amp;=id_list;

data want (drop=_:);
  if _n_=1 then
    do;
      if 0 then set have_lookup_table;
      declare hash h (dataset:"have_lookup_table (where=(id in (&amp;amp;id_list.)))");
      h.definekey('id');
      h.definedata('Price');
      h.definedone();
    end;
  call missing(of _all_);

  set have_base_table;
  _rc=h.find();
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sat, 01 Aug 2020 03:12:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-In-Place/m-p/673868#M202797</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2020-08-01T03:12:09Z</dc:date>
    </item>
    <item>
      <title>Re: Hash In Place</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-In-Place/m-p/673869#M202798</link>
      <description>&lt;P&gt;And just for fun here another "hash" way for doing this.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_;

  if _n_=1 then
    do;
      if 0 then set have_base_table have_lookup_table;
      dcl hash h(multidata:'y', ordered:'y');
      h.definekey('Item');
      h.definedata('Item','Trans_Date','Code','Price');
      h.definedone();
      do until(done);
        set have_base_table end=done;
        h.add();
      end;
    end;
  
  set have_lookup_table(rename=(price=_price)) end=last;

  do while(h.do_over() eq 0);
    price=_price;
    h.replacedup();
  end;

  if last then h.output(dataset:'want');
run;

proc print data=want;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sat, 01 Aug 2020 03:51:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-In-Place/m-p/673869#M202798</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2020-08-01T03:51:27Z</dc:date>
    </item>
    <item>
      <title>Re: Hash In Place</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-In-Place/m-p/673913#M202828</link>
      <description>&lt;P&gt;If your lookup table is large, it is probably a lot more efficient to create an index and use that to find the records:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
  create unique index item on have_lookup_table(item);
quit;

data want notfound;
  set have_base_table;
  set have_lookup_table key=item/unique;
  if _iorc_=0 then output want;
  else output notfound;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;- at least if you are using this kind of lookup often.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;A couple of notes:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;I created a unique index on the lookup table, meaning that duplicate values of the ITEM variable are not allowed&lt;/LI&gt;
&lt;LI&gt;I used the UNIQUE option for the SET with KEY=, so that you can find values for an item also when there are repeats in the base table&lt;/LI&gt;
&lt;LI&gt;I put in a bit of code to account for the situations where the item on the base table is not in the lookup table, using the _IORC_ automatic variable - these will be output to the table NOTFOUND in the example.&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Sat, 01 Aug 2020 14:56:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-In-Place/m-p/673913#M202828</guid>
      <dc:creator>s_lassen</dc:creator>
      <dc:date>2020-08-01T14:56:37Z</dc:date>
    </item>
    <item>
      <title>Re: Hash In Place</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-In-Place/m-p/674800#M203240</link>
      <description>&lt;P&gt;I was originally unable to use this solution because the variable list was too long to fit inside of a macro variable. However, I stumbled across this paper:&amp;nbsp;&lt;A href="https://www.lexjansen.com/sesug/2019/SESUG2019_Paper-117_Final_PDF.pdf" target="_blank" rel="noopener"&gt;https://www.lexjansen.com/sesug/2019/SESUG2019_Paper-117_Final_PDF.pdf&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It contains a - *deep breath* - macro that creates a series of macro variables that are then stored inside a macro variable. The result is something like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class="language-sas"&gt;*Assuming a bin size of 1 for illustration purposes only - could be much larger;
%let bin1 = Carrot;
%let bin2 = Apple;
%let bin3 = Banana;
%let bin4 = Orange;

%let master_bin = &amp;amp;bin1, &amp;amp;bin2, &amp;amp;bin3, &amp;amp;bin4;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Then, you would use the master variable in place of the id_list variable in the solution.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for the help!&lt;/P&gt;</description>
      <pubDate>Wed, 05 Aug 2020 17:47:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-In-Place/m-p/674800#M203240</guid>
      <dc:creator>theponcer</dc:creator>
      <dc:date>2020-08-05T17:47:54Z</dc:date>
    </item>
  </channel>
</rss>

