<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Hash merge question in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Hash-merge-question/m-p/164588#M31807</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thanks Ahmed,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;There are not SUPPOSED to be duplicate values of CDCID and VaxRecordNum but apparently there are.&amp;nbsp; So that's not good news.&amp;nbsp; At least that gives me something to pursue.&amp;nbsp; Thanks.&amp;nbsp; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Steve&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Wed, 02 Apr 2014 15:07:08 GMT</pubDate>
    <dc:creator>spjcdc</dc:creator>
    <dc:date>2014-04-02T15:07:08Z</dc:date>
    <item>
      <title>Hash merge question</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-merge-question/m-p/164585#M31804</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I have a very large dataset (69 million rec) that I need to merge with a relatively small dataset (28k recs).&amp;nbsp; I found proc sql took too long so I tried to create a merge using a SAS hash object.&amp;nbsp; I wrote the following code to do that.&amp;nbsp; I wanted to verify that everything was merging OK so I created a counter to increment every time there was a match.&amp;nbsp; I expected the counter to match the number of observations in the smaller dataset.&amp;nbsp; The dataset LiveDoseViolations has 27,542 observations in it but when i run the code the counter has the value 28,428.&amp;nbsp; I don't know how that could be the case nor do i really know how to figure out why.&amp;nbsp; Any ideas?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data LiveVirusValidatedData (drop=LiveDoseViolations) ;&lt;/P&gt;&lt;P&gt;if _N_ = 1 &lt;/P&gt;&lt;P&gt;then do;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; declare hash h(dataset:'LiveVaxViolations') ;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; h.defineKey('cdcid','VaxRecordNum') ;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; h.defineData('DoseValidity');&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; h.defineDone();&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; length CDCID $30 VaxRecordNum 8 DoseValidity $15 ;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; call missing(DoseValidity) ;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;set FullyDeDupedData end=lastob ;&lt;/P&gt;&lt;P&gt;rc = h.find() ;&lt;/P&gt;&lt;P&gt;if rc = 0 then LiveDoseViolations + 1 ;&lt;/P&gt;&lt;P&gt;if lastob then call symput('LiveDoseViolations',LiveDoseViolations) ;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The log is below:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;4491 +&lt;/P&gt;&lt;P&gt;4492 +data LiveVirusValidatedData (drop=LiveDoseViolations) ;&lt;/P&gt;&lt;P&gt;4493 +&lt;/P&gt;&lt;P&gt;4494 +if _N_ = 1&lt;/P&gt;&lt;P&gt;4495 +then do;&lt;/P&gt;&lt;P&gt;4496 +&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; declare hash h(dataset:'LiveVaxViolations') ;&lt;/P&gt;&lt;P&gt;4497 +&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; h.defineKey('cdcid','VaxRecordNum') ;&lt;/P&gt;&lt;P&gt;4498 +&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; h.defineData('DoseValidity');&lt;/P&gt;&lt;P&gt;4499 +&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; h.defineDone();&lt;/P&gt;&lt;P&gt;4500 +length CDCID $30 VaxRecordNum 8 DoseValidity $15 ;&lt;/P&gt;&lt;P&gt;4501 +&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; call missing(DoseValidity) ;&lt;/P&gt;&lt;P&gt;4502 +&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;4503 +set FullyDeDupedData end=lastob ;&lt;/P&gt;&lt;P&gt;4504 +rc = h.find() ;&lt;/P&gt;&lt;P&gt;4505 +if rc = 0 then LiveDoseViolations + 1 ;&lt;/P&gt;&lt;P&gt;4506 +if lastob then call symput('LiveDoseViolations',LiveDoseViolations) ;&lt;/P&gt;&lt;P&gt;4507 +run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;NOTE: Numeric values have been converted to character values at the places given by: &lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (Line):(Column).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4506:49&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;NOTE: There were&lt;EM&gt;&lt;STRONG style="color: #ff00ff;"&gt; 27542&lt;/STRONG&gt;&lt;/EM&gt; observations read from the data set WORK.LIVEVAXVIOLATIONS.&lt;/P&gt;&lt;P&gt;NOTE: There were 69646012 observations read from the data set WORK.FULLYDEDUPEDDATA.&lt;/P&gt;&lt;P&gt;NOTE: The data set WORK.LIVEVIRUSVhasALIDATEDDATA has 69646012 observations and 86 variables.&lt;/P&gt;&lt;P&gt;NOTE: Compressing data set WORK.LIVEVIRUSVALIDATEDDATA decreased size by 73.23 percent. &lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Compressed is 932110 pages; un-compressed would require 3482302 pages.&lt;/P&gt;&lt;P&gt;NOTE: DATA statement used (Total process time):&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; real time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 6:41.41&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; cpu time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 6:37.80&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;4508 +&lt;/P&gt;&lt;P&gt;4509 +%put NOTE: there were &amp;amp;LiveDoseViolations Records found with Live Virus Violations ;&lt;/P&gt;&lt;P&gt;NOTE: there were&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;EM&gt;&lt;STRONG style="color: #ff00ff;"&gt;&amp;nbsp; 28428&lt;/STRONG&gt;&lt;/EM&gt; Records found with Live Virus Violations&lt;/P&gt;&lt;P&gt;4510 +%put Note: This should match the # of observations in Work.LiveVaxViolations ;&lt;/P&gt;&lt;P&gt;Note: This should match the # of observations in Work.LiveVaxViolations&lt;/P&gt;&lt;P&gt;4511 +&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 02 Apr 2014 14:34:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-merge-question/m-p/164585#M31804</guid>
      <dc:creator>spjcdc</dc:creator>
      <dc:date>2014-04-02T14:34:34Z</dc:date>
    </item>
    <item>
      <title>Re: Hash merge question</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-merge-question/m-p/164586#M31805</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Sorry, I don't use these hash items, however if you pop in your sql code there could be optimizations to be had on that side.&amp;nbsp; E.g. create an intermediary dataset from the big one with only ids from the small one, then merge that on.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 02 Apr 2014 14:47:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-merge-question/m-p/164586#M31805</guid>
      <dc:creator>RW9</dc:creator>
      <dc:date>2014-04-02T14:47:41Z</dc:date>
    </item>
    <item>
      <title>Re: Hash merge question</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-merge-question/m-p/164587#M31806</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;@spjcdc,&lt;/P&gt;&lt;P&gt;The fact that you have higher Join/Found count (&lt;EM&gt;&lt;STRONG style="color: #ff00ff;"&gt;28,428&lt;/STRONG&gt;&lt;/EM&gt;) than your Distinct count (&lt;EM&gt;&lt;STRONG style="color: #ff00ff;"&gt;27,542&lt;/STRONG&gt;&lt;/EM&gt;), means there are duplicate ('cdcid','VaxRecordNum') combination/key in your large table (WORK.FULLYDEDUPEDDATA) of 69,646,012 observations.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Whether that's acceptable or not, that depends on your business operations and logic.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope this clarify your results,&lt;/P&gt;&lt;P&gt;Ahmed&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 02 Apr 2014 15:04:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-merge-question/m-p/164587#M31806</guid>
      <dc:creator>AhmedAl_Attar</dc:creator>
      <dc:date>2014-04-02T15:04:42Z</dc:date>
    </item>
    <item>
      <title>Re: Hash merge question</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-merge-question/m-p/164588#M31807</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thanks Ahmed,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;There are not SUPPOSED to be duplicate values of CDCID and VaxRecordNum but apparently there are.&amp;nbsp; So that's not good news.&amp;nbsp; At least that gives me something to pursue.&amp;nbsp; Thanks.&amp;nbsp; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Steve&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 02 Apr 2014 15:07:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-merge-question/m-p/164588#M31807</guid>
      <dc:creator>spjcdc</dc:creator>
      <dc:date>2014-04-02T15:07:08Z</dc:date>
    </item>
    <item>
      <title>Re: Hash merge question</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-merge-question/m-p/164589#M31808</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thanks RW9.&amp;nbsp; I only know enough SQL to be dangerous.&amp;nbsp; Here's what I have.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Proc SQL ;&lt;/P&gt;&lt;P&gt;Create Table LiveVirusValidatedData as &lt;/P&gt;&lt;P&gt;Select F.*, L.DoseValidity &lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from FullyDeDupedData F left join LiveVaxViolations L&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; on (F.VaxRecordNum = L.VaxRecordNum) &lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; order by CDCID, VaxDate, VaxGroup ;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; quit ;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Steve&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 02 Apr 2014 15:10:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-merge-question/m-p/164589#M31808</guid>
      <dc:creator>spjcdc</dc:creator>
      <dc:date>2014-04-02T15:10:12Z</dc:date>
    </item>
    <item>
      <title>Re: Hash merge question</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-merge-question/m-p/164590#M31809</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Well, at a quick glance (as I am about to leave) see below to create intermediate dataset, then use that to merge.&amp;nbsp; Also see if there are any other restrictions you could put in place, maybe summing data up, or transposing the data. &lt;/P&gt;&lt;P&gt;proc sql;&lt;/P&gt;&lt;P&gt;&amp;nbsp; /* Create an intermediary dataset with only the relevant data, i.e. has a match and only contains columns we need */&lt;BR /&gt;&amp;nbsp; create table inter as&lt;BR /&gt;&amp;nbsp; select&amp;nbsp; vaxrecordnum,&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; dosevalidity&lt;BR /&gt;&amp;nbsp; from&amp;nbsp;&amp;nbsp;&amp;nbsp; livevaxviolations&lt;BR /&gt;&amp;nbsp; where&amp;nbsp;&amp;nbsp; vaxrecordnum in (select distinct vaxrecordnum from fullydedupeddate);&lt;/P&gt;&lt;P&gt;&amp;nbsp; create table livevirusvalidateddata as &lt;BR /&gt;&amp;nbsp; select&amp;nbsp; f.*, &lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; l.dosevalidity &lt;BR /&gt;&amp;nbsp; from&amp;nbsp;&amp;nbsp;&amp;nbsp; fullydedupeddata f &lt;BR /&gt;&amp;nbsp; left join inter l&lt;BR /&gt;&amp;nbsp; on&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; f.vaxrecordnum = l.vaxrecordnum &lt;BR /&gt;&amp;nbsp; order by&amp;nbsp; cdcid, &lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; vaxdate, &lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; vaxgroup;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR /&gt; quit;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 02 Apr 2014 15:24:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-merge-question/m-p/164590#M31809</guid>
      <dc:creator>RW9</dc:creator>
      <dc:date>2014-04-02T15:24:55Z</dc:date>
    </item>
  </channel>
</rss>

