<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Why my Temp Array unduplication utterly fails vs HASH in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601314#M173910</link>
    <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/138205"&gt;@novinosrin&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;That's very disappointing to know it's sequential.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Does that mean WHICHC,WHICHN all will also be &lt;STRIKE&gt;useless&lt;/STRIKE&gt; too? Do these also do follow suit (i.e sequential search)?&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;It can't be anything but sequential. While the hash builds a search tree for the key variable(s), there is no such thing for an array.&lt;/P&gt;
&lt;P&gt;And the array has no way of "knowing" how much elements it has, so the "in" in the first data step iteration already goes through all 10 million elements. Reducing the array definition to the needed minimum will speed up things.&lt;/P&gt;
&lt;P&gt;But nothing will beat the hash here.&lt;/P&gt;</description>
    <pubDate>Mon, 04 Nov 2019 08:19:53 GMT</pubDate>
    <dc:creator>Kurt_Bremser</dc:creator>
    <dc:date>2019-11-04T08:19:53Z</dc:date>
    <item>
      <title>Why my Temp Array unduplication utterly fails vs HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601264#M173878</link>
      <description>&lt;P&gt;Why my Temp Array unduplication utterly fails vs HASH?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Objective : To get unique account_numbers&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Source data: 33 Million records&lt;/STRONG&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;

/*Runs forever, not completing at all :( */
data w;
 do until(z);
  set aqr.rp_portfolio_&amp;amp;min-aqr.rp_portfolio_&amp;amp;max 
  (keep=account_number portfolio cra_flag period id_investor)
  end=z;
  where period&amp;gt;= 1439 and id_investor not in ('014', '015', '091', '092') or
  period&amp;lt;1439;
  array t(9999999) _temporary_;
  length port_name $30;
  if account_number in t then continue;
  select;
   when (portfolio='Mortgage') port_name='Treasury';
   when (portfolio='Resi' and cra_flag='Y') port_name='CRA';
   when (portfolio in('Resi','HELOAN Low/No') and cra_flag='N') port_name='Resi (No CRA)';
   when (portfolio='Specialty') port_name=portfolio;
   otherwise port_name=' ';
  end;
  if port_name&amp;gt;' ' then do;
   n+1;
   t(n)=account_number;
   output;
  end;
 end;
 stop;
 keep account_number port_name;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What's wrong with me or my eyes that isn't spotting something very obvious in the previous, whereas the &lt;STRONG&gt;HASH equivalent below&lt;/STRONG&gt; &lt;STRONG&gt;runs in&amp;nbsp; real time 6:49.53&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;user cpu time 1:26.36&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;system cpu time 24.94 seconds&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;memory 33221.57k&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;OS Memory 58464.00k&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data w;
 dcl hash H () ;
 h.definekey  ("account_number") ;
 h.definedata ("account_number") ;
 h.definedone () ;
 do until(z);
  set aqr.rp_portfolio_&amp;amp;min-aqr.rp_portfolio_&amp;amp;max 
  (keep=account_number portfolio cra_flag period id_investor)
  end=z;
  where period&amp;gt;= 1439 and id_investor not in ('014', '015', '091', '092') or
  period&amp;lt;1439;
  length port_name $30;
  if h.check()=0 then continue;
  select;
   when (portfolio='Mortgage') port_name='Treasury';
   when (portfolio='Resi' and cra_flag='Y') port_name='CRA';
   when (portfolio in('Resi','HELOAN Low/No') and cra_flag='N') port_name='Resi (No CRA)';
   when (portfolio='Specialty') port_name=portfolio;
   otherwise port_name=' ';
  end;
  if port_name&amp;gt;' ' then do;
   h.add();
   output;
  end;
 end;
 stop;
 keep account_number port_name;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Any thoughts?&lt;/P&gt;</description>
      <pubDate>Sun, 03 Nov 2019 22:51:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601264#M173878</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2019-11-03T22:51:17Z</dc:date>
    </item>
    <item>
      <title>Re: Why my Temp Array unduplication utterly fails vs HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601266#M173880</link>
      <description>&lt;P&gt;You could have more than 10 million unique accounts.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also, the search in the array is sequential, while the hash find uses a b-tree or something similar performant.&lt;/P&gt;</description>
      <pubDate>Sun, 03 Nov 2019 22:55:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601266#M173880</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2019-11-03T22:55:50Z</dc:date>
    </item>
    <item>
      <title>Re: Why my Temp Array unduplication utterly fails vs HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601268#M173882</link>
      <description>&lt;P&gt;Sir &lt;EM&gt;"&lt;/EM&gt;&lt;SPAN&gt;&lt;EM&gt;You could have more than 10 million unique accounts"&lt;/EM&gt;&amp;nbsp;-This is funny though, we wish we were that big to have a huge portfloio. However,&amp;nbsp; for a regional bank, our portfolio size isn't too bad either. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; Nonetheless, that did make me think more. I am taking 82 months data min-max(range) as you would have noticed in the SET statement. The HASH successful&amp;nbsp;output has given me 190,000 which is right.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Hmm, so am i to believe and&amp;nbsp; learn ARRAY IN operator search is useless?? Oh gosh!&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 03 Nov 2019 23:18:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601268#M173882</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2019-11-03T23:18:10Z</dc:date>
    </item>
    <item>
      <title>Re: Why my Temp Array unduplication utterly fails vs HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601271#M173884</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/138205"&gt;@novinosrin&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Are your account number integers in the range&amp;nbsp; {-constant('bigint'):constant('bigint')?&amp;nbsp; Probably not, since I think you would likely use direct array lookup in such a case.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But are the account numbers mappable in a 1 to 1 correspondence to such a range?&amp;nbsp; If so, then instead of searching an array you could do direct lookup of the transformed account number.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Could you tell us the structure of your account numbers?&lt;/P&gt;</description>
      <pubDate>Sun, 03 Nov 2019 23:27:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601271#M173884</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2019-11-03T23:27:32Z</dc:date>
    </item>
    <item>
      <title>Re: Why my Temp Array unduplication utterly fails vs HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601274#M173885</link>
      <description>&lt;P&gt;Looks something like this&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;1829647&lt;BR /&gt;7329386&lt;BR /&gt;7339930&lt;BR /&gt;9515503&lt;BR /&gt;10005882&lt;BR /&gt;10039832&lt;BR /&gt;10053965&lt;BR /&gt;10065076&lt;BR /&gt;10068369&lt;BR /&gt;10073393&lt;/P&gt;
&lt;P&gt;5777&lt;/P&gt;
&lt;P&gt;70396902245&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And all are stored as numeric.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 03 Nov 2019 23:33:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601274#M173885</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2019-11-03T23:33:53Z</dc:date>
    </item>
    <item>
      <title>Re: Why my Temp Array unduplication utterly fails vs HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601276#M173886</link>
      <description>One item to note:&lt;BR /&gt;&lt;BR /&gt;The (keep= ) list applies to the last data set only.  For all the other source data sets, the program reads in all variables.  At least that's what it looks like.</description>
      <pubDate>Mon, 04 Nov 2019 00:45:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601276#M173886</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2019-11-04T00:45:32Z</dc:date>
    </item>
    <item>
      <title>Re: Why my Temp Array unduplication utterly fails vs HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601277#M173887</link>
      <description>&lt;P&gt;Sir&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/4954"&gt;@Astounding&lt;/a&gt;&amp;nbsp; You nearly gave me a heart attack . Luckily I checked the doc again before I would have jumped out of the building which the ARRAY IN is making me contemplate that. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&amp;nbsp;&lt;/P&gt;
&lt;H4 class="xis-argument"&gt;(&lt;SPAN class="xis-userSuppliedValue"&gt;data-set-options&lt;/SPAN&gt;)&lt;/H4&gt;
&lt;DIV class="xis-argumentDescription"&gt;
&lt;P class="xis-paraSimpleFirst"&gt;specifies actions SAS is to take when it reads variables or observations into the program data vector for processing.&lt;/P&gt;
&lt;P class="xis-paraSimpleFirst"&gt;&lt;A href="https://documentation.sas.com/?docsetId=lestmtsref&amp;amp;docsetTarget=p00hxg3x8lwivcn1f0e9axziw57y.htm&amp;amp;docsetVersion=9.4&amp;amp;locale=en"&gt;https://documentation.sas.com/?docsetId=lestmtsref&amp;amp;docsetTarget=p00hxg3x8lwivcn1f0e9axziw57y.htm&amp;amp;docsetVersion=9.4&amp;amp;locale=en&lt;/A&gt;&lt;/P&gt;
&lt;TABLE class="xis-summary"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD class="xis-summaryTip"&gt;Tip&lt;/TD&gt;
&lt;TD class="xis-summaryText"&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;Data set options that apply to a data set list apply to all of the data sets in the list.&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="xis-summarySee"&gt;&lt;FONT color="#339966"&gt;&lt;STRONG&gt;See&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/TD&gt;
&lt;TD class="xis-summaryText"&gt;&lt;SPAN class="xis-xrefSee"&gt;&lt;SPAN class="xis-xrefText"&gt;For more information, see&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;A class="ng-scope" tabindex="0" title="" href="https://documentation.sas.com/?docsetId=ledsoptsref&amp;amp;docsetTarget=n1tgswl0rz314fn1iac4vkbw86g0.htm&amp;amp;docsetVersion=9.4&amp;amp;locale=en" data-docset-id="ledsoptsref" data-docset-version="9.4" data-original-href="n1tgswl0rz314fn1iac4vkbw86g0.htm"&gt;Definition of Data Set Options&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;in&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;I&gt;SAS Data Set Options: Reference&lt;/I&gt;&lt;SPAN class="xis-xrefText"&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;for a list of the data set options to use with input data sets.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;/DIV&gt;</description>
      <pubDate>Mon, 04 Nov 2019 00:56:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601277#M173887</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2019-11-04T00:56:34Z</dc:date>
    </item>
    <item>
      <title>Re: Why my Temp Array unduplication utterly fails vs HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601278#M173888</link>
      <description>OK, sorry about that.  The last time I saw that part of the documentation, data set lists didn't exist.  I promise to take a second look and see what I can find.</description>
      <pubDate>Mon, 04 Nov 2019 01:08:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601278#M173888</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2019-11-04T01:08:33Z</dc:date>
    </item>
    <item>
      <title>Re: Why my Temp Array unduplication utterly fails vs HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601280#M173889</link>
      <description>&lt;P&gt;I guess your tables are not sorted or indexed by account number?&lt;/P&gt;</description>
      <pubDate>Mon, 04 Nov 2019 01:44:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601280#M173889</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2019-11-04T01:44:54Z</dc:date>
    </item>
    <item>
      <title>Re: Why my Temp Array unduplication utterly fails vs HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601282#M173890</link>
      <description>&lt;P&gt;Sir&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/462"&gt;@PGStats&lt;/a&gt;&amp;nbsp; &amp;nbsp;True. This is for that very purpose indeed. The simple HAVE and WANT below demo works exactly as intended, but when reading company data, it isn't. &lt;span class="lia-unicode-emoji" title=":disappointed_face:"&gt;😞&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For example:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;
data have;
 input acctno;
 cards;
 8
 3
 8
 3
 10
 3
 6
 5
 3
 ;

data want;
 do until(z);
  set have end=z;
  array t(9) _temporary_;
  if acctno in t then continue;
  n+1;
  t(n)=acctno;
  output;
 end;
 drop n;
run;

proc print noobs;run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;DIV class="branch"&gt;
&lt;DIV&gt;
&lt;DIV align="center"&gt;
&lt;TABLE class="table" summary="Procedure Print: Data Set WORK.WANT" frame="box" rules="all" cellspacing="0" cellpadding="5"&gt;&lt;COLGROUP&gt; &lt;COL /&gt;&lt;/COLGROUP&gt;
&lt;THEAD&gt;
&lt;TR&gt;
&lt;TH class="r header" scope="col"&gt;acctno&lt;/TH&gt;
&lt;/TR&gt;
&lt;/THEAD&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD class="r data"&gt;8&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="r data"&gt;3&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="r data"&gt;10&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="r data"&gt;6&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="r data"&gt;5&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Mon, 04 Nov 2019 01:58:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601282#M173890</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2019-11-04T01:58:34Z</dc:date>
    </item>
    <item>
      <title>Re: Why my Temp Array unduplication utterly fails vs HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601283#M173891</link>
      <description>&lt;P&gt;&lt;EM&gt;&amp;gt; am i to believe and learn ARRAY IN operator search is useless?&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/11562"&gt;@Kurt_Bremser&lt;/a&gt; said, "the search in the array is sequential, while the hash find uses a b-tree".&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For each new account (there are 200,000), you are sequentially searching through 10 million array values.&lt;/P&gt;
&lt;P&gt;That's 2e5 * 1e7 = 2e12 searches. Two thousand billion searches. 2,000,000,000,000 sequential searches.&lt;/P&gt;
&lt;P&gt;Plus other searches for the other 33,000,000 records where a key is found earlier without scanning the whole array.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;How do you expect this to be fast?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And even if magnitude wasn't an issue (it is), how could a sequential search be compared to an indexed hash search?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So useless? Absolutely not. The in operator for an array has its place. But tools are only good when used within their specifications. A F1 car is useless off-road and a buggy is useless on the tarmac. Likewise, expecting to search an array the way you are trying can only fail.&lt;/P&gt;</description>
      <pubDate>Mon, 04 Nov 2019 02:10:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601283#M173891</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2019-11-04T02:10:09Z</dc:date>
    </item>
    <item>
      <title>Re: Why my Temp Array unduplication utterly fails vs HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601284#M173892</link>
      <description>&lt;P&gt;That's very disappointing to know it's sequential.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Does that mean WHICHC,WHICHN all will also be &lt;STRIKE&gt;useless&lt;/STRIKE&gt; too? Do these also do follow suit (i.e sequential search)?&lt;/P&gt;</description>
      <pubDate>Mon, 04 Nov 2019 02:18:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601284#M173892</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2019-11-04T02:18:45Z</dc:date>
    </item>
    <item>
      <title>Re: Why my Temp Array unduplication utterly fails vs HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601287#M173893</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/138205"&gt;@novinosrin&lt;/a&gt;&amp;nbsp;&amp;nbsp;Don't blame the tool. All tools have specifications and limitations. The key is to know these and use the tools in an optimal manner.&lt;/P&gt;
&lt;P&gt;You can also &lt;A href="https://communities.sas.com/t5/SASware-Ballot-Ideas/idb-p/sas_ideas" target="_self"&gt;suggest improvements&lt;/A&gt;, but that's an other matter (and doesn't preclude knowing the specifications and limitations).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 04 Nov 2019 02:41:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601287#M173893</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2019-11-04T02:41:12Z</dc:date>
    </item>
    <item>
      <title>Re: Why my Temp Array unduplication utterly fails vs HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601288#M173894</link>
      <description>&lt;P&gt;Which begs the question: Why create a 10-million-values array to store 200,000 values?&lt;/P&gt;</description>
      <pubDate>Mon, 04 Nov 2019 22:51:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601288#M173894</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2019-11-04T22:51:48Z</dc:date>
    </item>
    <item>
      <title>Re: Why my Temp Array unduplication utterly fails vs HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601290#M173895</link>
      <description>&lt;P&gt;Agree.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 04 Nov 2019 02:49:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601290#M173895</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2019-11-04T02:49:01Z</dc:date>
    </item>
    <item>
      <title>Re: Why my Temp Array unduplication utterly fails vs HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601295#M173900</link>
      <description>&lt;P&gt;OK, here's an approach that might be feasible, might not.&amp;nbsp; The issues .....&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It treats account number as numeric.&amp;nbsp; While you can easily work around that, the cost for that is not clear.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It utilizes the direct array lookup that&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31461"&gt;@mkeintz&lt;/a&gt;&amp;nbsp;mentioned.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It needs a lot more memory.&amp;nbsp; Whether you can get that much is not clear ... on the order of 400 times what you are using now for the temporary array.&amp;nbsp; The array expands to 11 digits, since the largest account number you illustrated is 11 digits long.&amp;nbsp; The memory requirements might be shrinkable, since it appears that you don't really need a length of $30 for PORT_NAME.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I took out the DO loop, since I didn't see a good purpose for keeping it.&amp;nbsp; Perhaps its purpose is to maintain a top-to-bottom structure for the DATA step logic.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If all this is acceptable, it should be lightning fast at loading the values, and should take a bit more time to unload them.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;
data w;
  array t {99999999999} $ 30 _temporary_;
  if z then do account_number = 1 to 99999999999;
     if t{account_number} &amp;gt; ' ' then do;
        port_name = t{account_number};
        output;
     end;
  end;   
  set aqr.rp_portfolio_&amp;amp;min-aqr.rp_portfolio_&amp;amp;max 
  (keep=account_number portfolio cra_flag period id_investor)
  end=z;
  where period&amp;gt;= 1439 and id_investor not in ('014', '015', '091', '092') or
  period&amp;lt;1439;
  if t{account_number} &amp;gt; ' ' then return;
  select;
   when (portfolio='Mortgage') t{account_number}='Treasury';
   when (portfolio='Resi' and cra_flag='Y') t{account_number}='CRA';
   when (portfolio in('Resi','HELOAN Low/No') and cra_flag='N') 
         t{account_number}='Resi (No CRA)';
   when (portfolio='Specialty') t{account_number}='Specialty';
   otherwise; &lt;BR /&gt;  end;
 keep account_number port_name;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 04 Nov 2019 03:33:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601295#M173900</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2019-11-04T03:33:22Z</dc:date>
    </item>
    <item>
      <title>Re: Why my Temp Array unduplication utterly fails vs HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601296#M173901</link>
      <description>&lt;P&gt;Is there some reason for using a cryptic test for missing character values?&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;charvar &amp;gt; ' '&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Why not use something that is clearer?&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;not missing(charvar)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;charvar ne ' '&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Note that the cyptic test will fail if the string is not empty but starts with a character that is before space in the ASCII sequence. Like a TAB.&lt;/P&gt;</description>
      <pubDate>Mon, 04 Nov 2019 03:36:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601296#M173901</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2019-11-04T03:36:40Z</dc:date>
    </item>
    <item>
      <title>Re: Why my Temp Array unduplication utterly fails vs HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601298#M173903</link>
      <description>&lt;P&gt;No reason, that's just how I've always done it.&amp;nbsp; As a general rule, calling a function takes longer, but I haven't tested for this case.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;However, this logic looks a little too complex:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;if t{account_number} &amp;gt; ' ' then return;
  select;
   when (portfolio='Mortgage') t{account_number}='Treasury';
   when (portfolio='Resi' and cra_flag='Y') t{account_number}='CRA';
   when (portfolio in('Resi','HELOAN Low/No') and cra_flag='N') 
         t{account_number}='Resi (No CRA)';
   when (portfolio='Specialty') t{account_number}='Specialty';
   otherwise;
   end;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;It should probably be simplified to remove the RETURN statement:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;if t{account_number} = ' ' then select;
   when (portfolio='Mortgage') t{account_number}='Treasury';
   when (portfolio='Resi' and cra_flag='Y') t{account_number}='CRA';
   when (portfolio in('Resi','HELOAN Low/No') and cra_flag='N') 
         t{account_number}='Resi (No CRA)';
   when (portfolio='Specialty') t{account_number}='Specialty';
   otherwise;  
end;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Is that the spot in the program you are referring to?&lt;/P&gt;</description>
      <pubDate>Mon, 04 Nov 2019 15:57:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601298#M173903</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2019-11-04T15:57:41Z</dc:date>
    </item>
    <item>
      <title>Re: Why my Temp Array unduplication utterly fails vs HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601299#M173904</link>
      <description>&lt;P&gt;I was just asking if there was some special reason why that test would be better than the normal check for non blank value. Note using &amp;gt; will fail if the string starts with a character that is before space in the ASCII coding sequence.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Note I have also seen other posts were someone used&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;numvar &amp;gt; .&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;As if it meant not missing.&amp;nbsp; Again that test will fail for special missing since .A to .Z are all large than normal missing value.&amp;nbsp; .Z is the largest missing value.&lt;/P&gt;</description>
      <pubDate>Mon, 04 Nov 2019 03:45:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601299#M173904</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2019-11-04T03:45:21Z</dc:date>
    </item>
    <item>
      <title>Re: Why my Temp Array unduplication utterly fails vs HASH</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601300#M173905</link>
      <description>&lt;P&gt;I guess whatever is fasted is best ... in this particular case.&amp;nbsp; The program itself assigns all those character values, so I'm not worried about strange values that might appear in the data.&amp;nbsp; But I certainly have been guilty of that in some programs that I have written.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It reminds me of discussions about the range low - high for user-defined character formats.&amp;nbsp; There are definitely cases where you need to be aware of values lower than a blank.&lt;/P&gt;</description>
      <pubDate>Mon, 04 Nov 2019 03:46:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Why-my-Temp-Array-unduplication-utterly-fails-vs-HASH/m-p/601300#M173905</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2019-11-04T03:46:37Z</dc:date>
    </item>
  </channel>
</rss>

