<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: HASH vs. PROC COMPARE in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/HASH-vs-PROC-COMPARE/m-p/501563#M133739</link>
    <description>Are the differences significant?</description>
    <pubDate>Thu, 04 Oct 2018 16:12:49 GMT</pubDate>
    <dc:creator>Reeza</dc:creator>
    <dc:date>2018-10-04T16:12:49Z</dc:date>
    <item>
      <title>HASH vs. PROC COMPARE</title>
      <link>https://communities.sas.com/t5/SAS-Programming/HASH-vs-PROC-COMPARE/m-p/501549#M133731</link>
      <description>&lt;P&gt;Hello Everybody,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I was trying to verify a hash method for comparing dataset-observations (via keys), thereby revealing a somewhat odd inconsistency:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data cars ;
set sashelp.cars;
run;

data cars2 (keep = Make Model compare_digest);
length compare_digest $ 32;
set sashelp.cars;
compare_digest = put(md5(catq(' ', Type, Origin, DriveTrain, Invoice)), $hex32.);
run;
 
/* source/compare lookup: hash lookup method  */ 
data 
     work.changed_obs
     work.new_obs
     work.same_obs;

   length source_digest $ 32;
   
   drop source_digest compare_digest ;
   
   if 0 then
      set work.cars2;
      
   if _N_ eq 1 then
   do;
      declare hash hct(dataset: 'work.cars2', hashexp: 20);
                   hct.defineKey("Make", "Model");
                   hct.defineData("Make", "Model", "compare_digest");
                   hct.defineDone();
   end;
   
   set work.CARS (keep = Make Model Type Origin DriveTrain Invoice) end = eof ;
   
   /* create digest  */ 
   source_digest = put(md5(catq(' ', Type, Origin, DriveTrain, Invoice)), $hex32.);
   
   /* source/compare match  */ 
   if hct.find() eq 0 then
   do;
      /* source/compare match: no change detected  */ 
      if source_digest eq compare_digest then
         output work.same_obs;
         
      /* source/compare match: change detected  */ 
      else if source_digest ne compare_digest then
         output work.changed_obs;
         
   end;
   /* source table: new records  */ 
   else if hct.find() ne 0 then
      output work.new_obs;

      
run;


data cars_for_comp cars_for_comp2;
set sashelp.cars;
run;

proc sort data=work.cars_for_comp out=cars_for_comp_s; by make Model;run;
proc sort data=work.cars_for_comp2 out=cars_for_comp2_s; by make Model;run;

proc compare noprint base=work.cars_for_comp_s compare=work.cars_for_comp2_s   out=diffs outnoeq    ; by make Model ;
run;



&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;The 1 million $ question is:&lt;/P&gt;&lt;P&gt;why do I get 3 observations with the hash method that are regarded different between the two "car files" whereas when using the COMPARE PROCEDURE I do get the correct result, namely, THERE ARE NO DIFFERENCES!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Cheers,&lt;/P&gt;&lt;P&gt;FK1&lt;/P&gt;</description>
      <pubDate>Thu, 04 Oct 2018 15:48:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/HASH-vs-PROC-COMPARE/m-p/501549#M133731</guid>
      <dc:creator>FK1</dc:creator>
      <dc:date>2018-10-04T15:48:08Z</dc:date>
    </item>
    <item>
      <title>Re: HASH vs. PROC COMPARE</title>
      <link>https://communities.sas.com/t5/SAS-Programming/HASH-vs-PROC-COMPARE/m-p/501563#M133739</link>
      <description>Are the differences significant?</description>
      <pubDate>Thu, 04 Oct 2018 16:12:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/HASH-vs-PROC-COMPARE/m-p/501563#M133739</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2018-10-04T16:12:49Z</dc:date>
    </item>
    <item>
      <title>Re: HASH vs. PROC COMPARE</title>
      <link>https://communities.sas.com/t5/SAS-Programming/HASH-vs-PROC-COMPARE/m-p/501570#M133741</link>
      <description>Well, there 428 observations and 3 are not being indentified correctly... so, I would say, it is not significantly much, but I do not know the pattern when matching works and when not. So it remains a result of pure chance to me, which is not acceptable when creataing quality controlled processes...</description>
      <pubDate>Thu, 04 Oct 2018 16:28:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/HASH-vs-PROC-COMPARE/m-p/501570#M133741</guid>
      <dc:creator>FK1</dc:creator>
      <dc:date>2018-10-04T16:28:34Z</dc:date>
    </item>
    <item>
      <title>Re: HASH vs. PROC COMPARE</title>
      <link>https://communities.sas.com/t5/SAS-Programming/HASH-vs-PROC-COMPARE/m-p/501574#M133744</link>
      <description>&lt;P&gt;This is because there are 3&amp;nbsp;&amp;nbsp; make/model duplicates in the original data set, so you hash object has only 425 data items.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;42  proc sort data=sashelp.cars out=test nodupkey; by make model;run;

NOTE: There were 428 observations read from the data set SASHELP.CARS.
&lt;EM&gt;&lt;STRONG&gt;NOTE: 3 observations with duplicate key values were deleted.&lt;/STRONG&gt;&lt;/EM&gt;
NOTE: The data set WORK.TEST has 425 observations and 15 variables.
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Which also means there is one dataitem in the hash object for a given make/model, with one value of the MD5 results.&amp;nbsp; But the MD5 function for those duplicates depends on&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Type, Origin, DriveTrain, Invoice&lt;/P&gt;
&lt;P&gt;which no doubt have differing values for the records with duplicate make/model.&lt;/P&gt;
&lt;P&gt;&lt;LI-WRAPPER&gt;&lt;/LI-WRAPPER&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 04 Oct 2018 16:35:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/HASH-vs-PROC-COMPARE/m-p/501574#M133744</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2018-10-04T16:35:01Z</dc:date>
    </item>
    <item>
      <title>Re: HASH vs. PROC COMPARE</title>
      <link>https://communities.sas.com/t5/SAS-Programming/HASH-vs-PROC-COMPARE/m-p/501606#M133763</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/114220"&gt;@FK1&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;Well, there 428 observations and 3 are not being indentified correctly... so, I would say, it is not significantly much, but I do not know the pattern when matching works and when not. So it remains a result of pure chance to me, which is not acceptable when creataing quality controlled processes...&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;PRE&gt;Make           Model                                     DriveTrain

Infiniti        G35 4dr                                  All
Infiniti        G35 4dr                                  Rear
Mercedes-Benz   C240 4dr                                 All
Mercedes-Benz   C240 4dr                                 Rear
Mercedes-Benz   C320 4dr                                 All
Mercedes-Benz   C320 4dr                                 Rear


&lt;/PRE&gt;</description>
      <pubDate>Thu, 04 Oct 2018 17:25:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/HASH-vs-PROC-COMPARE/m-p/501606#M133763</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2018-10-04T17:25:18Z</dc:date>
    </item>
  </channel>
</rss>

