<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic hash merge in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/hash-merge/m-p/862051#M340491</link>
    <description>&lt;P&gt;I'm trying to find which records in a small dataset ('COHORT.HICS') are not in the larger one (ABCD.DATA).&amp;nbsp; Trying use merge via hash rather than sorting and then a data step, but it seems to run out resources.&amp;nbsp; &amp;nbsp;Any suggestions how to make it work.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;2 OPTIONS NOCENTER OBS= MAX ;&lt;BR /&gt;3&lt;BR /&gt;4 DATA HICDIFF0(KEEP=HIC);&lt;BR /&gt;5 IF 0 THEN SET ABCD.DATA;&lt;BR /&gt;6 IF _N_ = 1 THEN DO;&lt;BR /&gt;7 DECLARE HASH MB(DATASET:'ABCD.DATA');&lt;BR /&gt;8 MB.DEFINEKEY('HIC');&lt;BR /&gt;9 MB.DEFINEDONE();&lt;BR /&gt;10 END;&lt;BR /&gt;11 SET COHORT.HICS;&lt;BR /&gt;12 IF MB.FIND() NE 0 THEN OUTPUT;&lt;BR /&gt;13 RUN;&lt;BR /&gt;&lt;BR /&gt;NOTE: THERE WERE 12354503 OBSERVATIONS READ FROM THE DATA SET ABCD.DATA.&lt;BR /&gt;ERROR: HASH DATA SET LOAD FAILED AT LINE 9 COLUMN 5.&lt;BR /&gt;ERROR: HASH OBJECT ADDED 12354498 ITEMS WHEN MEMORY FAILURE OCCURRED.&lt;BR /&gt;FATAL: INSUFFICIENT MEMORY TO EXECUTE DATA STEP PROGRAM. ABORTED DURING THE EXEC&lt;BR /&gt;ERROR: THE SAS SYSTEM STOPPED PROCESSING THIS STEP BECAUSE OF INSUFFICIENT MEMOR&lt;BR /&gt;WARNING: THE DATA SET WORK.HICDIFF0 MAY BE INCOMPLETE. WHEN THIS STEP WAS STOPP&lt;BR /&gt;NOTE: THE DATA STATEMENT USED 12.64 CPU SECONDS AND 18741K.&lt;BR /&gt;&lt;BR /&gt;NOTE: THE ADDRESS SPACE HAS USED A MAXIMUM OF 920K BELOW THE LINE AND 1555896K A&lt;/P&gt;</description>
    <pubDate>Fri, 03 Mar 2023 00:19:28 GMT</pubDate>
    <dc:creator>Batman</dc:creator>
    <dc:date>2023-03-03T00:19:28Z</dc:date>
    <item>
      <title>hash merge</title>
      <link>https://communities.sas.com/t5/SAS-Programming/hash-merge/m-p/862051#M340491</link>
      <description>&lt;P&gt;I'm trying to find which records in a small dataset ('COHORT.HICS') are not in the larger one (ABCD.DATA).&amp;nbsp; Trying use merge via hash rather than sorting and then a data step, but it seems to run out resources.&amp;nbsp; &amp;nbsp;Any suggestions how to make it work.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;2 OPTIONS NOCENTER OBS= MAX ;&lt;BR /&gt;3&lt;BR /&gt;4 DATA HICDIFF0(KEEP=HIC);&lt;BR /&gt;5 IF 0 THEN SET ABCD.DATA;&lt;BR /&gt;6 IF _N_ = 1 THEN DO;&lt;BR /&gt;7 DECLARE HASH MB(DATASET:'ABCD.DATA');&lt;BR /&gt;8 MB.DEFINEKEY('HIC');&lt;BR /&gt;9 MB.DEFINEDONE();&lt;BR /&gt;10 END;&lt;BR /&gt;11 SET COHORT.HICS;&lt;BR /&gt;12 IF MB.FIND() NE 0 THEN OUTPUT;&lt;BR /&gt;13 RUN;&lt;BR /&gt;&lt;BR /&gt;NOTE: THERE WERE 12354503 OBSERVATIONS READ FROM THE DATA SET ABCD.DATA.&lt;BR /&gt;ERROR: HASH DATA SET LOAD FAILED AT LINE 9 COLUMN 5.&lt;BR /&gt;ERROR: HASH OBJECT ADDED 12354498 ITEMS WHEN MEMORY FAILURE OCCURRED.&lt;BR /&gt;FATAL: INSUFFICIENT MEMORY TO EXECUTE DATA STEP PROGRAM. ABORTED DURING THE EXEC&lt;BR /&gt;ERROR: THE SAS SYSTEM STOPPED PROCESSING THIS STEP BECAUSE OF INSUFFICIENT MEMOR&lt;BR /&gt;WARNING: THE DATA SET WORK.HICDIFF0 MAY BE INCOMPLETE. WHEN THIS STEP WAS STOPP&lt;BR /&gt;NOTE: THE DATA STATEMENT USED 12.64 CPU SECONDS AND 18741K.&lt;BR /&gt;&lt;BR /&gt;NOTE: THE ADDRESS SPACE HAS USED A MAXIMUM OF 920K BELOW THE LINE AND 1555896K A&lt;/P&gt;</description>
      <pubDate>Fri, 03 Mar 2023 00:19:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/hash-merge/m-p/862051#M340491</guid>
      <dc:creator>Batman</dc:creator>
      <dc:date>2023-03-03T00:19:28Z</dc:date>
    </item>
    <item>
      <title>Re: hash merge</title>
      <link>https://communities.sas.com/t5/SAS-Programming/hash-merge/m-p/862052#M340492</link>
      <description>&lt;P&gt;Hash tables have to fit entirely into memory. Why not swap the tables so the small one&amp;nbsp;COHORT.HICS is the hash table?&lt;/P&gt;</description>
      <pubDate>Fri, 03 Mar 2023 00:26:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/hash-merge/m-p/862052#M340492</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2023-03-03T00:26:56Z</dc:date>
    </item>
    <item>
      <title>Re: hash merge</title>
      <link>https://communities.sas.com/t5/SAS-Programming/hash-merge/m-p/862053#M340493</link>
      <description>would I need to change to code in row 12?</description>
      <pubDate>Fri, 03 Mar 2023 00:31:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/hash-merge/m-p/862053#M340493</guid>
      <dc:creator>Batman</dc:creator>
      <dc:date>2023-03-03T00:31:39Z</dc:date>
    </item>
    <item>
      <title>Re: hash merge</title>
      <link>https://communities.sas.com/t5/SAS-Programming/hash-merge/m-p/862059#M340497</link>
      <description>&lt;P&gt;How many rows are in your small table and in your large table? A hash merge might not be the best approach for comparing a very small table with a very large one.&lt;/P&gt;</description>
      <pubDate>Fri, 03 Mar 2023 01:53:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/hash-merge/m-p/862059#M340497</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2023-03-03T01:53:05Z</dc:date>
    </item>
    <item>
      <title>Re: hash merge</title>
      <link>https://communities.sas.com/t5/SAS-Programming/hash-merge/m-p/862070#M340505</link>
      <description>&lt;P&gt;Does the LARGE one have an INDEX on HIC?&amp;nbsp; If so then use that.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
  set SMALL;
  set LARGE key=HIC;
  if _error_ then do;
     put 'NOTE: ' HIC= 'not found.';
     _error_=0;
  end;
  output;
  call missing(of _all_);
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Example:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data small (keep=name) large(index=(name)) ;
  set sashelp.class end=eof;
  if _n_ in (1,5,7) then output small;
  output large;
  if eof then do;
    name = 'ZZZ'; output small;
  end;
run;

data want;
  set SMALL;
  set LARGE key=NAME ;
  if _error_ then do;
     put 'NOTE: ' NAME= 'not found.';
     _error_=0;
  end;
  output;
  call missing(of _all_);
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Result:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Tom_0-1677817821174.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/81023iAFE64A854826A45C/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Tom_0-1677817821174.png" alt="Tom_0-1677817821174.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 03 Mar 2023 04:30:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/hash-merge/m-p/862070#M340505</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2023-03-03T04:30:28Z</dc:date>
    </item>
    <item>
      <title>Re: hash merge</title>
      <link>https://communities.sas.com/t5/SAS-Programming/hash-merge/m-p/862079#M340511</link>
      <description>&lt;P&gt;Read the small dataset into a hash where you define an additional flag variable (you need to use a DO loop and the ADD() method to fill the hash).&lt;/P&gt;
&lt;P&gt;Then read the large dataset and update the flag in the hash if a particular HIC is found. At the end, write out only those members from the hash which have the flag set.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Just an idea, will test it later when I'm in front of my desktop.&lt;/P&gt;</description>
      <pubDate>Fri, 03 Mar 2023 07:39:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/hash-merge/m-p/862079#M340511</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2023-03-03T07:39:03Z</dc:date>
    </item>
    <item>
      <title>Re: hash merge</title>
      <link>https://communities.sas.com/t5/SAS-Programming/hash-merge/m-p/862081#M340513</link>
      <description>&lt;P&gt;Here a code example for my suggestion:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data small;
set sashelp.class (obs=10);
run;

data large;
set sashelp.class (firstobs=6);
run;

data _null_;
set large end=done;
if _n_ = 1
then do;
  length flag 8;
  declare hash s (ordered:"yes");
  s.definekey("name");
  s.definedata("name","sex","age","height","weight","flag");
  s.definedone();
  do until(done_s);
    set small end=done_s;
    rc = s.add();
  end;
end;
if s.find() = 0
then do;
  flag = 1;
  rc = s.replace();
end;
if done
then do;
  rc = s.output(dataset:"want (where=(flag = 1))");
end;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 03 Mar 2023 08:12:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/hash-merge/m-p/862081#M340513</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2023-03-03T08:12:39Z</dc:date>
    </item>
    <item>
      <title>Re: hash merge</title>
      <link>https://communities.sas.com/t5/SAS-Programming/hash-merge/m-p/862085#M340515</link>
      <description>&lt;P&gt;You may run into trouble with that code, if the small dataset has repeating values on the NAME variable (the first Alfred will be found, the second will not).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;To avoid that, use KEY=/UNIQUE:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
  set SMALL;
  set LARGE key=NAME  /unique;
  if _error_ then do;
     put 'NOTE: ' NAME= 'not found.';
     _error_=0;
  end;
  output;
  call missing(of _all_);
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 03 Mar 2023 08:58:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/hash-merge/m-p/862085#M340515</guid>
      <dc:creator>s_lassen</dc:creator>
      <dc:date>2023-03-03T08:58:45Z</dc:date>
    </item>
    <item>
      <title>Re: hash merge</title>
      <link>https://communities.sas.com/t5/SAS-Programming/hash-merge/m-p/862099#M340518</link>
      <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/*
There is no need Hash at all.
It is a piece of cake for SQL
*/
data small (keep=name) large(index=(name)) ;
  set sashelp.class end=eof;
  if _n_ in (1,5,7) then output small;
  output large;
  if eof then do;
    name = 'ZZZ'; output small;
  end;
run;

proc sql;
create table want as
select name from small
except
select name from large;
quit;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 03 Mar 2023 12:46:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/hash-merge/m-p/862099#M340518</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2023-03-03T12:46:38Z</dc:date>
    </item>
    <item>
      <title>Re: hash merge</title>
      <link>https://communities.sas.com/t5/SAS-Programming/hash-merge/m-p/862121#M340528</link>
      <description>Yes, I normally don't expect sql to run run faster than hash, but it does in this case, apparently because no sort is being done.</description>
      <pubDate>Fri, 03 Mar 2023 13:59:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/hash-merge/m-p/862121#M340528</guid>
      <dc:creator>Batman</dc:creator>
      <dc:date>2023-03-03T13:59:11Z</dc:date>
    </item>
    <item>
      <title>Re: hash merge</title>
      <link>https://communities.sas.com/t5/SAS-Programming/hash-merge/m-p/862232#M340563</link>
      <description>&lt;P&gt;That's why it's a good idea not to be fixated on any particular methodology when you have tricky high data volume use case.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Most common hash question titles: How do I use hash to do XYZ?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;A better title is: What is the best coding method to do XYZ?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/18408"&gt;@Ksharp&lt;/a&gt; 's SQL solution is a lot simpler so it's a win-win...&lt;/P&gt;</description>
      <pubDate>Fri, 03 Mar 2023 22:19:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/hash-merge/m-p/862232#M340563</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2023-03-03T22:19:51Z</dc:date>
    </item>
  </channel>
</rss>

