<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic using hash to select observations in New SAS User</title>
    <link>https://communities.sas.com/t5/New-SAS-User/using-hash-to-select-observations/m-p/556551#M9849</link>
    <description>&lt;P&gt;Hi SAS communities,&lt;/P&gt;&lt;P&gt;I've heard the power of "in-memory" table look-up presented by hash tables.&lt;/P&gt;&lt;P&gt;Having no backgrounds in programming, I still didn't understand the advantage of hash tables after reading a bunch of sas papers online about hash table.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Now I come across a problem in SAS coding and just wondering whether this problem can be solved by hash tables.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm soliciting general ideas, so no detailed sas code and data sets are available for me to provide. I have no idea how to write hash table code on this problem. I'm just curious whether this problem can be solved by hash tables.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here is the problem:&lt;/P&gt;&lt;P&gt;I have two datasets A and B. Both datasets have the same variables, for example, acct and revenue. The only difference is that dataset A has 1milion observations and dataset B has 1milllion plus 20 observations that are not in dataset A.&lt;/P&gt;&lt;P&gt;I want to find who these 20 additional observations are in dataset B that can not be found in dataset A.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I write a proc sql code like the following to approach this problem:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;STRONG&gt;proc&lt;/STRONG&gt;&lt;/FONT&gt; &lt;STRONG&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;sql&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;select&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; * &lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;from&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; datasetB&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;where&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; acct_key &lt;/FONT&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;not&lt;/FONT&gt; &lt;FONT color="#0000ff" face="Courier New" size="3"&gt;in&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; ( &lt;/FONT&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;select&lt;/FONT&gt; &lt;FONT color="#0000ff" face="Courier New" size="3"&gt;distinct&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; acct_key &lt;/FONT&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;from&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; datasetA);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;STRONG&gt;quit&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="3"&gt;The problem is that, with 1million and 1million+20 observations in each dataset, this piece of code takes insane amount of time for SAS to finish the job. My sas even crushes once and runs forever. Wondering how Hash table can help in this case?? How can the magical "in-memory table look up" function possessed by hash table help me to cut down the running time and do the job more efficiently?&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="3"&gt;Thanks in advance to all dear sas community members. I'm indebted to your thoughts/inputs/ideas/suggestions.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 06 May 2019 20:10:28 GMT</pubDate>
    <dc:creator>changxuosu</dc:creator>
    <dc:date>2019-05-06T20:10:28Z</dc:date>
    <item>
      <title>using hash to select observations</title>
      <link>https://communities.sas.com/t5/New-SAS-User/using-hash-to-select-observations/m-p/556551#M9849</link>
      <description>&lt;P&gt;Hi SAS communities,&lt;/P&gt;&lt;P&gt;I've heard the power of "in-memory" table look-up presented by hash tables.&lt;/P&gt;&lt;P&gt;Having no backgrounds in programming, I still didn't understand the advantage of hash tables after reading a bunch of sas papers online about hash table.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Now I come across a problem in SAS coding and just wondering whether this problem can be solved by hash tables.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm soliciting general ideas, so no detailed sas code and data sets are available for me to provide. I have no idea how to write hash table code on this problem. I'm just curious whether this problem can be solved by hash tables.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here is the problem:&lt;/P&gt;&lt;P&gt;I have two datasets A and B. Both datasets have the same variables, for example, acct and revenue. The only difference is that dataset A has 1milion observations and dataset B has 1milllion plus 20 observations that are not in dataset A.&lt;/P&gt;&lt;P&gt;I want to find who these 20 additional observations are in dataset B that can not be found in dataset A.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I write a proc sql code like the following to approach this problem:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;STRONG&gt;proc&lt;/STRONG&gt;&lt;/FONT&gt; &lt;STRONG&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;sql&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;select&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; * &lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;from&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; datasetB&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;where&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; acct_key &lt;/FONT&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;not&lt;/FONT&gt; &lt;FONT color="#0000ff" face="Courier New" size="3"&gt;in&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; ( &lt;/FONT&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;select&lt;/FONT&gt; &lt;FONT color="#0000ff" face="Courier New" size="3"&gt;distinct&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; acct_key &lt;/FONT&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;from&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; datasetA);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;STRONG&gt;quit&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="3"&gt;The problem is that, with 1million and 1million+20 observations in each dataset, this piece of code takes insane amount of time for SAS to finish the job. My sas even crushes once and runs forever. Wondering how Hash table can help in this case?? How can the magical "in-memory table look up" function possessed by hash table help me to cut down the running time and do the job more efficiently?&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="3"&gt;Thanks in advance to all dear sas community members. I'm indebted to your thoughts/inputs/ideas/suggestions.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 06 May 2019 20:10:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/using-hash-to-select-observations/m-p/556551#M9849</guid>
      <dc:creator>changxuosu</dc:creator>
      <dc:date>2019-05-06T20:10:28Z</dc:date>
    </item>
    <item>
      <title>Re: using hash to select observations</title>
      <link>https://communities.sas.com/t5/New-SAS-User/using-hash-to-select-observations/m-p/556554#M9850</link>
      <description>&lt;P&gt;Sounds like a nice job for a hash object. As you mention, we can't see your data, so needless to say, this code is untested but see if this gives you what you want&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
   if 0 then set A;
   if _N_ = 1 then do;
      declare hash h(dataset:'A');
      h.defineKey(acct_key);
      h.defineDone();
   end;

   set B;

   if h.check() ne 0;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Before explaining this 'magical' in-memory technique, let me know if this works for you &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; An absolute requirement for it to work is that the data in the hash object fits in memory.&lt;/P&gt;</description>
      <pubDate>Tue, 07 May 2019 06:55:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/using-hash-to-select-observations/m-p/556554#M9850</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2019-05-07T06:55:23Z</dc:date>
    </item>
    <item>
      <title>Re: using hash to select observations</title>
      <link>https://communities.sas.com/t5/New-SAS-User/using-hash-to-select-observations/m-p/557101#M9947</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/193702"&gt;@changxuosu&lt;/a&gt;,&amp;nbsp;did the above solve your problem? &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 08 May 2019 13:28:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/using-hash-to-select-observations/m-p/557101#M9947</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2019-05-08T13:28:28Z</dc:date>
    </item>
    <item>
      <title>Re: using hash to select observations</title>
      <link>https://communities.sas.com/t5/New-SAS-User/using-hash-to-select-observations/m-p/559680#M10376</link>
      <description>&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31304"&gt;@PeterClemmensen&lt;/a&gt; I'm sorry it took a while to get back to you. I was kept from unexpected deadlines. Thank you so much for your suggestion. I tried it and still trying to understand it.&lt;BR /&gt;I met the following error when tried it, so that you know.&lt;BR /&gt;&lt;BR /&gt;ERROR: Undeclared key symbol for hash object at line 31 column 7.&lt;BR /&gt;ERROR: DATA STEP Component Object failure. Aborted during the EXECUTION phase.&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 17 May 2019 15:08:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/using-hash-to-select-observations/m-p/559680#M10376</guid>
      <dc:creator>changxuosu</dc:creator>
      <dc:date>2019-05-17T15:08:09Z</dc:date>
    </item>
    <item>
      <title>Re: using hash to select observations</title>
      <link>https://communities.sas.com/t5/New-SAS-User/using-hash-to-select-observations/m-p/559859#M10396</link>
      <description>&lt;P&gt;Sorry, the key variable should be in quotation marks like this. How about that?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
   if 0 then set A;
   if _N_ = 1 then do;
      declare hash h(dataset:'A');
      h.defineKey('acct_key');
      h.defineDone();
   end;

   set B;

   if h.check() ne 0;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sat, 18 May 2019 08:58:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/using-hash-to-select-observations/m-p/559859#M10396</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2019-05-18T08:58:05Z</dc:date>
    </item>
    <item>
      <title>Re: using hash to select observations</title>
      <link>https://communities.sas.com/t5/New-SAS-User/using-hash-to-select-observations/m-p/560208#M10437</link>
      <description>thank you so much for your followup. It worked greatly &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;</description>
      <pubDate>Mon, 20 May 2019 16:05:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/using-hash-to-select-observations/m-p/560208#M10437</guid>
      <dc:creator>changxuosu</dc:creator>
      <dc:date>2019-05-20T16:05:56Z</dc:date>
    </item>
    <item>
      <title>Re: using hash to select observations</title>
      <link>https://communities.sas.com/t5/New-SAS-User/using-hash-to-select-observations/m-p/560261#M10442</link>
      <description>&lt;P&gt;No problem. I'm glad you found your answer. Do you have any questions about the code?&lt;/P&gt;</description>
      <pubDate>Mon, 20 May 2019 19:42:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/using-hash-to-select-observations/m-p/560261#M10442</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2019-05-20T19:42:21Z</dc:date>
    </item>
    <item>
      <title>Re: using hash to select observations</title>
      <link>https://communities.sas.com/t5/New-SAS-User/using-hash-to-select-observations/m-p/561398#M10584</link>
      <description>I don't have any questions for now. Have a nice day, warm hearted SAS expert &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;</description>
      <pubDate>Fri, 24 May 2019 14:01:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/using-hash-to-select-observations/m-p/561398#M10584</guid>
      <dc:creator>changxuosu</dc:creator>
      <dc:date>2019-05-24T14:01:16Z</dc:date>
    </item>
  </channel>
</rss>

