<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Performance of update / usage of hash tables in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Performance-of-update-usage-of-hash-tables/m-p/473277#M285707</link>
    <description>&lt;P&gt;&amp;nbsp;Dear all,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Many thanks for all of your replies and hints. After all, the best solution&amp;nbsp;for me was to rewrite dataset and apply update rules, like&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/4954"&gt;@Astounding&lt;/a&gt;&amp;nbsp;suggested. In my case it was incomparably faster then update in proc sql.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Once again, thanks for your help!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;Cheers!&lt;/P&gt;</description>
    <pubDate>Tue, 26 Jun 2018 08:53:42 GMT</pubDate>
    <dc:creator>ssomak</dc:creator>
    <dc:date>2018-06-26T08:53:42Z</dc:date>
    <item>
      <title>Performance of update / usage of hash tables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Performance-of-update-usage-of-hash-tables/m-p/471098#M285701</link>
      <description>&lt;P&gt;&amp;nbsp;Hi all,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Let me first describe my problem.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;I have a set of tables (let's say there are 40 tables), each table has&amp;nbsp;more than 2&amp;nbsp;millions&amp;nbsp;different customers.&lt;/LI&gt;
&lt;LI&gt;I have set of columns that have to be updated in every table. For example, in every table I have to update PESEL column in the following way: '111_customer_id'&lt;/LI&gt;
&lt;LI&gt;I have&amp;nbsp;a list of customers to be updated (a lot of customers)&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;I created my program in that way, that for every&amp;nbsp;customer I am creating update rule and I run this rule. So for every table I run something like this:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;update customer set pesel='111_&amp;amp;ID.' where ID=101;
update customer set pesel='111_&amp;amp;ID.' where ID=102;
/* and so on */&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;But&amp;nbsp;this solution takes a lot of time - and it is simply unacceptable.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Then I tried to change my program to use hash tables instead of simple update. But I also have problems: because I have to make separate rule for every client (there is client's ID in PESEL rule), I have to make temporary table and transpose it for every customer. Also, I cannot use hash tables for numeric columns (like birth_date), because after transposing every colum is character (so I have an error that types does not match).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So my question is: is there any 'tricky' solution to figure this problem out? Maybe I should use hash tables, but in different way? Or make some parallelism? Do you have any ideas?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I will be grateful for any help!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Cheers!&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jun 2018 14:21:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Performance-of-update-usage-of-hash-tables/m-p/471098#M285701</guid>
      <dc:creator>ssomak</dc:creator>
      <dc:date>2018-06-18T14:21:02Z</dc:date>
    </item>
    <item>
      <title>Re: Performance of update / usage of hash tables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Performance-of-update-usage-of-hash-tables/m-p/471104#M285702</link>
      <description>&lt;P&gt;The code you provided is invalid:&lt;/P&gt;
&lt;PRE class=" language-sas"&gt;&lt;CODE class="  language-sas"&gt;&lt;SPAN class="token keyword"&gt;update&lt;/SPAN&gt; customer &lt;SPAN class="token keyword"&gt;set&lt;/SPAN&gt; pesel&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;&lt;SPAN class="token string"&gt;'111_&amp;amp;ID.'&lt;/SPAN&gt; &lt;SPAN class="token statement"&gt;where&lt;/SPAN&gt; &lt;SPAN class="token keyword"&gt;ID&lt;/SPAN&gt;&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;&lt;SPAN class="token number"&gt;101&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;This will set pesel to the text 111_&amp;amp;id. as you use single quotes.&amp;nbsp; Macro variables only get resolved in double quotes.&amp;nbsp; It also sets that string for every ID regardless of the where, as it is exactly the same string in the second given line, so effectively the code you present =&lt;/P&gt;
&lt;PRE class=" language-sas"&gt;&lt;CODE class="  language-sas"&gt;&lt;SPAN class="token keyword"&gt;update&lt;/SPAN&gt; customer &lt;SPAN class="token keyword"&gt;set&lt;/SPAN&gt; pesel&lt;SPAN class="token operator"&gt;=&lt;/SPAN&gt;&lt;SPAN class="token string"&gt;'111_&amp;amp;ID.'&lt;/SPAN&gt; &lt;SPAN class="token statement"&gt;where&lt;/SPAN&gt; &lt;SPAN class="token keyword"&gt;ID in (&lt;/SPAN&gt;&lt;SPAN class="token number"&gt;101,102)&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Are you doing this once or repeatedly?&amp;nbsp; &amp;nbsp;I would imagine that datastep may be more efficient to do this than a series of update calls - unless this is running on a database.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;A good idea is to post some simple test data in the form of a datastep and what you want to see out at the end.&amp;nbsp; 2mil records is a lot, so it will take a bit of time, but its not that vast.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jun 2018 14:36:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Performance-of-update-usage-of-hash-tables/m-p/471104#M285702</guid>
      <dc:creator>RW9</dc:creator>
      <dc:date>2018-06-18T14:36:59Z</dc:date>
    </item>
    <item>
      <title>Re: Performance of update / usage of hash tables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Performance-of-update-usage-of-hash-tables/m-p/471113#M285703</link>
      <description>&lt;P&gt;I think &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/45151"&gt;@RW9&lt;/a&gt;&amp;nbsp;is moving in the right direction on this, suggesting a DATA step.&amp;nbsp; Look at your current process:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Locate data for ID 101.&amp;nbsp; Update it.&lt;/P&gt;
&lt;P&gt;Locate data for ID 102.&amp;nbsp; Update it.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The "locate" process is consuming resources.&amp;nbsp; You would be well-advised to try a simple solution first, such as:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data want;&lt;/P&gt;
&lt;P&gt;set have;&lt;/P&gt;
&lt;P&gt;if id in (101, 102) then PESEL = catx('_', '111', ID);&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;That way, the program can read the data sequentially (= faster), and limit the "locate" step to finding each observation just once.&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jun 2018 14:51:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Performance-of-update-usage-of-hash-tables/m-p/471113#M285703</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2018-06-18T14:51:55Z</dc:date>
    </item>
    <item>
      <title>Re: Performance of update / usage of hash tables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Performance-of-update-usage-of-hash-tables/m-p/471208#M285704</link>
      <description>&lt;P&gt;With a data step approach you might need to explicitly set a longer LENGTH for your customer values as you may potentially exceed the current length of the variables.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Of course it might drastically help to actually show value the macro variable ID contains, the current value of PeSEL and the actual desired resulting value.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jun 2018 19:34:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Performance-of-update-usage-of-hash-tables/m-p/471208#M285704</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2018-06-18T19:34:20Z</dc:date>
    </item>
    <item>
      <title>Re: Performance of update / usage of hash tables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Performance-of-update-usage-of-hash-tables/m-p/471266#M285705</link>
      <description>&lt;P&gt;Have a look at this thread.&lt;/P&gt;
&lt;P&gt;&lt;A href="https://communities.sas.com/t5/Base-SAS-Programming/Alternative-to-SAS-Merge/td-p/304046/highlight/true/page/2" target="_blank"&gt;https://communities.sas.com/t5/Base-SAS-Programming/Alternative-to-SAS-Merge/td-p/304046/highlight/true/page/2&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;MODIFY&amp;nbsp; plus KEY= may well be what you need.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Other methods are explored too. I'll be adding a chapter on table joins in the third edition of my book. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jun 2018 23:40:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Performance-of-update-usage-of-hash-tables/m-p/471266#M285705</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2018-06-18T23:40:25Z</dc:date>
    </item>
    <item>
      <title>Re: Performance of update / usage of hash tables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Performance-of-update-usage-of-hash-tables/m-p/471302#M285706</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/21393"&gt;@ssomak&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;I believe you need to give us a bit more information to give you a suitable answer.&lt;/P&gt;
&lt;P&gt;- Is this a one off or ongoing process?&lt;/P&gt;
&lt;P&gt;- Is this data in SAS tables or in a database?&lt;/P&gt;
&lt;P&gt;-&amp;nbsp;How do these rules look like?&lt;/P&gt;
&lt;P&gt;&amp;nbsp; - Please post some sample data in the form of a working data step. A table with your data to be changed and a table with the rules and customers to be changed.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 19 Jun 2018 04:20:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Performance-of-update-usage-of-hash-tables/m-p/471302#M285706</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2018-06-19T04:20:17Z</dc:date>
    </item>
    <item>
      <title>Re: Performance of update / usage of hash tables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Performance-of-update-usage-of-hash-tables/m-p/473277#M285707</link>
      <description>&lt;P&gt;&amp;nbsp;Dear all,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Many thanks for all of your replies and hints. After all, the best solution&amp;nbsp;for me was to rewrite dataset and apply update rules, like&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/4954"&gt;@Astounding&lt;/a&gt;&amp;nbsp;suggested. In my case it was incomparably faster then update in proc sql.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Once again, thanks for your help!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;Cheers!&lt;/P&gt;</description>
      <pubDate>Tue, 26 Jun 2018 08:53:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Performance-of-update-usage-of-hash-tables/m-p/473277#M285707</guid>
      <dc:creator>ssomak</dc:creator>
      <dc:date>2018-06-26T08:53:42Z</dc:date>
    </item>
    <item>
      <title>Re: Performance of update / usage of hash tables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Performance-of-update-usage-of-hash-tables/m-p/473618#M285708</link>
      <description>&lt;P&gt;&amp;nbsp;Hi again,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Unfortunatelly I have another problem. As I wrote, I used solution proposed by&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/4954"&gt;@Astounding&lt;/a&gt;, so I re-write my data set in the following way:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data &amp;amp;dataset.;
	set &amp;amp;dataset.;
	if id in (&amp;amp;IDs.) then do;
		&amp;amp;update_rules.; /* here are my update rules, for example: PESEL='11111'*/
	end;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;But the problem is that I re-write 'old' datasets, and some columns have&amp;nbsp;user-defined formats. Because of this, I got the following error:&lt;/P&gt;
&lt;PRE&gt;ERROR: The format XYZ was not found or could not be loaded&lt;/PRE&gt;
&lt;P&gt;So here comes my question:&lt;/P&gt;
&lt;P&gt;Do you know if there is some SAS&amp;nbsp;option to leave column as it is when format could not be loaded? What can you suggest in such situation?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;Again, many thanks for any hints! &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 27 Jun 2018 07:29:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Performance-of-update-usage-of-hash-tables/m-p/473618#M285708</guid>
      <dc:creator>ssomak</dc:creator>
      <dc:date>2018-06-27T07:29:31Z</dc:date>
    </item>
    <item>
      <title>Re: Performance of update / usage of hash tables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Performance-of-update-usage-of-hash-tables/m-p/473619#M285709</link>
      <description>&lt;P&gt;Use:&lt;/P&gt;
&lt;PRE&gt;options nofmterr;&lt;/PRE&gt;
&lt;P&gt;Before the code in quesiton, and then:&lt;/P&gt;
&lt;PRE&gt;options fmterr;&lt;/PRE&gt;
&lt;P&gt;Afterwards.&lt;/P&gt;</description>
      <pubDate>Wed, 27 Jun 2018 07:40:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Performance-of-update-usage-of-hash-tables/m-p/473619#M285709</guid>
      <dc:creator>RW9</dc:creator>
      <dc:date>2018-06-27T07:40:53Z</dc:date>
    </item>
    <item>
      <title>Re: Performance of update / usage of hash tables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Performance-of-update-usage-of-hash-tables/m-p/473632#M285710</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/45151"&gt;@RW9&lt;/a&gt;&amp;nbsp;Cool, thank you! &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 27 Jun 2018 08:28:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Performance-of-update-usage-of-hash-tables/m-p/473632#M285710</guid>
      <dc:creator>ssomak</dc:creator>
      <dc:date>2018-06-27T08:28:14Z</dc:date>
    </item>
  </channel>
</rss>

