<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Proc Sql optimisation - Hash? in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Proc-Sql-optimisation-Hash/m-p/117329#M259437</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have 2 datasets, a master and a lookup.&amp;nbsp; The master has id, log_date and role.&amp;nbsp; I want to update the role column in master from the lookup dataset which has id, date_from, date_to, group and role.&amp;nbsp; One id can belong to more than one group at the same time, however I am only interested in group 1, so the data is unique when dates are taken into account.&amp;nbsp; I've written a Proc Sql statement to update master, but this is taking hours to run (the master dataset has around 15 million rows, the lookup dataset around 17 thousand).&amp;nbsp; I assume this is due to the where clause I've had to include.&amp;nbsp; Does anyone know of a more efficient way to update for this type of query?&amp;nbsp; Possibly using Hash?&amp;nbsp; Sadly I'm still in the dark ages of 9.1.3, so I'm not able to use the extra Hash functionality that came in 9.2 (fuzzy matching).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Below is an example of the 2 datasets I have, along with the Proc Sql code I'm currently running.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data master;&lt;/P&gt;&lt;P&gt;input id log_date :date9. role;&lt;/P&gt;&lt;P&gt;format log_date date9.;&lt;/P&gt;&lt;P&gt;cards;&lt;/P&gt;&lt;P&gt;1 25jul2012 2&lt;/P&gt;&lt;P&gt;2 25jul2012 2&lt;/P&gt;&lt;P&gt;3 25jul2012 4&lt;/P&gt;&lt;P&gt;4 25jul2012 1&lt;/P&gt;&lt;P&gt;5 25jul2012 1&lt;/P&gt;&lt;P&gt;6 25jul2012 1&lt;/P&gt;&lt;P&gt;;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data lookup;&lt;/P&gt;&lt;P&gt;input id (date_from date_to) (:date9.) group role;&lt;/P&gt;&lt;P&gt;format date: date9.;&lt;/P&gt;&lt;P&gt;cards;&lt;/P&gt;&lt;P&gt;1 01jul2012 01aug2012 1 10&lt;/P&gt;&lt;P&gt;1 01jul2012 01aug2012 2 20&lt;/P&gt;&lt;P&gt;2 01jul2012 01aug2012 1 10&lt;/P&gt;&lt;P&gt;2 01jul2012 01aug2012 2 30&lt;/P&gt;&lt;P&gt;3 01jun2012 30jun2012 1 10&lt;/P&gt;&lt;P&gt;3 01jul2012 01aug2012 1 20&lt;/P&gt;&lt;P&gt;3 01jul2012 01aug2012 2 40&lt;/P&gt;&lt;P&gt;4 01jul2012 01aug2012 1 20&lt;/P&gt;&lt;P&gt;5 01jul2012 01aug2012 2 20&lt;/P&gt;&lt;P&gt;;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sql;&lt;/P&gt;&lt;P&gt;update master as a &lt;/P&gt;&lt;P&gt;&amp;nbsp; set role=(select role from lookup as b&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; where a.id=b.id and a.log_date between b.date_from and b.date_to and b.group=1)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;&amp;nbsp; where exists (select id from lookup as c &lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; where a.id=c.id and a.log_date between c.date_from and c.date_to and c.group=1);&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Wed, 25 Jul 2012 13:46:43 GMT</pubDate>
    <dc:creator>Keith</dc:creator>
    <dc:date>2012-07-25T13:46:43Z</dc:date>
    <item>
      <title>Proc Sql optimisation - Hash?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Proc-Sql-optimisation-Hash/m-p/117329#M259437</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi all,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have 2 datasets, a master and a lookup.&amp;nbsp; The master has id, log_date and role.&amp;nbsp; I want to update the role column in master from the lookup dataset which has id, date_from, date_to, group and role.&amp;nbsp; One id can belong to more than one group at the same time, however I am only interested in group 1, so the data is unique when dates are taken into account.&amp;nbsp; I've written a Proc Sql statement to update master, but this is taking hours to run (the master dataset has around 15 million rows, the lookup dataset around 17 thousand).&amp;nbsp; I assume this is due to the where clause I've had to include.&amp;nbsp; Does anyone know of a more efficient way to update for this type of query?&amp;nbsp; Possibly using Hash?&amp;nbsp; Sadly I'm still in the dark ages of 9.1.3, so I'm not able to use the extra Hash functionality that came in 9.2 (fuzzy matching).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Below is an example of the 2 datasets I have, along with the Proc Sql code I'm currently running.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data master;&lt;/P&gt;&lt;P&gt;input id log_date :date9. role;&lt;/P&gt;&lt;P&gt;format log_date date9.;&lt;/P&gt;&lt;P&gt;cards;&lt;/P&gt;&lt;P&gt;1 25jul2012 2&lt;/P&gt;&lt;P&gt;2 25jul2012 2&lt;/P&gt;&lt;P&gt;3 25jul2012 4&lt;/P&gt;&lt;P&gt;4 25jul2012 1&lt;/P&gt;&lt;P&gt;5 25jul2012 1&lt;/P&gt;&lt;P&gt;6 25jul2012 1&lt;/P&gt;&lt;P&gt;;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data lookup;&lt;/P&gt;&lt;P&gt;input id (date_from date_to) (:date9.) group role;&lt;/P&gt;&lt;P&gt;format date: date9.;&lt;/P&gt;&lt;P&gt;cards;&lt;/P&gt;&lt;P&gt;1 01jul2012 01aug2012 1 10&lt;/P&gt;&lt;P&gt;1 01jul2012 01aug2012 2 20&lt;/P&gt;&lt;P&gt;2 01jul2012 01aug2012 1 10&lt;/P&gt;&lt;P&gt;2 01jul2012 01aug2012 2 30&lt;/P&gt;&lt;P&gt;3 01jun2012 30jun2012 1 10&lt;/P&gt;&lt;P&gt;3 01jul2012 01aug2012 1 20&lt;/P&gt;&lt;P&gt;3 01jul2012 01aug2012 2 40&lt;/P&gt;&lt;P&gt;4 01jul2012 01aug2012 1 20&lt;/P&gt;&lt;P&gt;5 01jul2012 01aug2012 2 20&lt;/P&gt;&lt;P&gt;;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sql;&lt;/P&gt;&lt;P&gt;update master as a &lt;/P&gt;&lt;P&gt;&amp;nbsp; set role=(select role from lookup as b&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; where a.id=b.id and a.log_date between b.date_from and b.date_to and b.group=1)&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;&amp;nbsp; where exists (select id from lookup as c &lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; where a.id=c.id and a.log_date between c.date_from and c.date_to and c.group=1);&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 25 Jul 2012 13:46:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Proc-Sql-optimisation-Hash/m-p/117329#M259437</guid>
      <dc:creator>Keith</dc:creator>
      <dc:date>2012-07-25T13:46:43Z</dc:date>
    </item>
    <item>
      <title>Re: Proc Sql optimisation - Hash?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Proc-Sql-optimisation-Hash/m-p/117330#M259438</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Keith,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Here's the old-fashioned way, before hash tables existed.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Use the smaller file to create two formats (one for each GROUP) that translate from ID + date into ROLE.&amp;nbsp; For the first line of your data, it would be equivalent to:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;'0120120701' - '0120120801' = '10'&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Note how putting dates in a YYMMDD format creates a valid range.&lt;/P&gt;&lt;P&gt;Here is one way to approach that step:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data lookup2;&lt;/P&gt;&lt;P&gt;set lookup end=done;&lt;/P&gt;&lt;P&gt;start = put(id,z2.) || put(date_from,yymmddn8.);&lt;/P&gt;&lt;P&gt;end = put(id,z2.) || put(date_to, yymmddn8.);&lt;/P&gt;&lt;P&gt;fmtname = '$findgroup' || put(group,1.) || '_';&lt;/P&gt;&lt;P&gt;label=role;&lt;/P&gt;&lt;P&gt;output;&lt;/P&gt;&lt;P&gt;if done then do;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; hlo='O';&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; label='Not found';&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; fmtname='$findgroup1_';&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; output;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; fmtname='$findgroup2_';&lt;/P&gt;&lt;P&gt;end;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sort data=lookup2;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; by fmtname;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You might tweak that code in a few ways, adjusting the formats used or switching to CAT functions.&amp;nbsp; But once you get that output, create the formats using:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc format cntlin=lookup2;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Then a DATA step can easily retrieve the pieces and you can apply whatever logic you would like.&amp;nbsp; I leave the logic to you but here is the lookup portion:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data master;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; set master;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; lookup_string = put(id,z2.) || put(log_date,yymmddn8.);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; match_on_group1 = put(lookup_string, $findgroup1_.);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; match_on_group2 = put(lookup_string, $findgroup2_.);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; * then add your logic to utilize the matching strings;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Depending on the logic you want to apply, you may or may not need the second format.&amp;nbsp; As an easy final step, remember to drop any extra variables.&amp;nbsp; I would be interested to know how much improvement you see in the speed.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Good luck.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 25 Jul 2012 15:18:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Proc-Sql-optimisation-Hash/m-p/117330#M259438</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2012-07-25T15:18:32Z</dc:date>
    </item>
    <item>
      <title>Re: Proc Sql optimisation - Hash?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Proc-Sql-optimisation-Hash/m-p/117331#M259439</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Ah yes, I'm familiar with merging using formats but I hadn't thought of using it here.&amp;nbsp; I'll see if improves efficiency significantly.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 25 Jul 2012 15:30:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Proc-Sql-optimisation-Hash/m-p/117331#M259439</guid>
      <dc:creator>Keith</dc:creator>
      <dc:date>2012-07-25T15:30:49Z</dc:date>
    </item>
    <item>
      <title>Re: Proc Sql optimisation - Hash?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Proc-Sql-optimisation-Hash/m-p/117332#M259440</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Keith,&lt;/P&gt;&lt;P&gt;Does Tom's method work for you?&lt;/P&gt;&lt;P&gt;&lt;A _jive_internal="true" href="https://communities.sas.com/thread/34025"&gt;https://communities.sas.com/thread/34025&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks - Linlin&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 25 Jul 2012 19:51:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Proc-Sql-optimisation-Hash/m-p/117332#M259440</guid>
      <dc:creator>Linlin</dc:creator>
      <dc:date>2012-07-25T19:51:01Z</dc:date>
    </item>
  </channel>
</rss>

