<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: md5 hash issue in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/md5-hash-issue/m-p/574522#M162359</link>
    <description>PS I'm using SAS V9.2</description>
    <pubDate>Thu, 18 Jul 2019 11:39:01 GMT</pubDate>
    <dc:creator>taupirho</dc:creator>
    <dc:date>2019-07-18T11:39:01Z</dc:date>
    <item>
      <title>md5 hash issue</title>
      <link>https://communities.sas.com/t5/SAS-Programming/md5-hash-issue/m-p/574396#M162292</link>
      <description>&lt;P&gt;Hi all I'm a SAS newbie and need some help with the following issue. I'm experimenting with the md5 hash function and testing it on the sashelp.shoes dataset.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In a datastep, for each input&amp;nbsp; observation,&amp;nbsp; I'm creating a new variable that is a concatenation of&amp;nbsp; each field using catx with comma as the field separator. I then use that variable as input to the md5 function and sure enough I get a hash value back which I print out to the log using PUT. I repeated this for each input line. However I then exported the shoes dataset to a CSV text&amp;nbsp; file.&amp;nbsp; The export enclosed the fields in double quotes and included $ and commas for the currency fields which I removed manually from the CSV file. So visually the the lines of the CSV looked exactly like the lines output from SAS. Next I wrote a little python script to read each line of the CSV&amp;nbsp; text file and calculate a md5 hash for that. Unfortunately none of the hashes for the CSV file matched the hashes from SAS. Has anybody done something similar and if so can you tell me where I'm going wrong. I know the python code is correct as I checked the results using the built-in md5 checker in windows.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;When&amp;nbsp; I get into work again tomorrow I'll post some of the code I'm using if that helps. Meanwhile if anyone can help that would be appreciated&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm using enterprise guide V4.2 I think&lt;/P&gt;</description>
      <pubDate>Wed, 17 Jul 2019 22:54:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/md5-hash-issue/m-p/574396#M162292</guid>
      <dc:creator>taupirho</dc:creator>
      <dc:date>2019-07-17T22:54:25Z</dc:date>
    </item>
    <item>
      <title>Re: md5 hash issue</title>
      <link>https://communities.sas.com/t5/SAS-Programming/md5-hash-issue/m-p/574405#M162298</link>
      <description>&lt;P&gt;Simplify and ensure that SAS and Python MD5 functions return the same results (AFAIK they should but you seem to be wanting to verify that):&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data test;
   set sashelp.class (keep=name);
   hash1=md5(name);
   hash2=md5(strip(name));

   hash1x=put(hash1,hex32.);
   hash2x=put(hash2,hex32.);
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Also note that SAS variables are padded with spaces to the length of the variable.&amp;nbsp; Note the difference between md5(var) vs. md5(strip(var)).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Get SAS and Python to generate the same MD5 result for "Alfred", "Alice", "Bob", etc, then add complexity (concatenated columns) to the mix.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I don't know if this will help with your export issues?&amp;nbsp;&amp;nbsp;&lt;A href="https://github.com/scottbass/SAS/blob/master/Macro/export.sas" target="_blank"&gt;https://github.com/scottbass/SAS/blob/master/Macro/export.sas&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 18 Jul 2019 00:19:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/md5-hash-issue/m-p/574405#M162298</guid>
      <dc:creator>ScottBass</dc:creator>
      <dc:date>2019-07-18T00:19:49Z</dc:date>
    </item>
    <item>
      <title>Re: md5 hash issue</title>
      <link>https://communities.sas.com/t5/SAS-Programming/md5-hash-issue/m-p/574476#M162336</link>
      <description>&lt;P&gt;As promised , here is the SAS code I'm using and the first few output records I'm getting.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;/* concatenate all fields of a dataset and compute a checksum */&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;STRONG&gt;proc&lt;/STRONG&gt;&lt;/FONT&gt; &lt;STRONG&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;sql&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;select&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; name &lt;/FONT&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;into&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; :varstr2 separated by &lt;/FONT&gt;&lt;FONT color="#800080" face="Courier New" size="3"&gt;','&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;from&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; dictionary.columns&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;where&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; libname = &lt;/FONT&gt;&lt;FONT color="#800080" face="Courier New" size="3"&gt;"SASHELP"&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; and&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="3"&gt;memname = &lt;/FONT&gt;&lt;FONT color="#800080" face="Courier New" size="3"&gt;"SHOES"&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;quit&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;　&lt;/P&gt;&lt;P&gt;　&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;STRONG&gt;data&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; stuff(&lt;/FONT&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;drop&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;=check all);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;format&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; check &lt;/FONT&gt;&lt;FONT color="#008080" face="Courier New" size="3"&gt;$hex32.&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;set&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; sashelp.shoes &lt;/FONT&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;end&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;=end1;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="3"&gt;newvar2 = catx(&lt;/FONT&gt;&lt;FONT color="#800080" face="Courier New" size="3"&gt;','&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;,&amp;amp;varstr2);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="3"&gt;all = catx(&lt;/FONT&gt;&lt;FONT color="#800080" face="Courier New" size="3"&gt;','&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;,&amp;amp;varstr2);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;check = md5(all);&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;put&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; all;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;put&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; check;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;run&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Africa,Boot,Addis Ababa,12,29761,191821,769&lt;BR /&gt;0F7503F59119E8248D89ED645F886871&lt;BR /&gt;Africa,Men's Casual,Addis Ababa,4,67242,118036,2284&lt;BR /&gt;8066D31E7C2A254EAB127C121B526DF7&lt;BR /&gt;Africa,Men's Dress,Addis Ababa,7,76793,136273,2433&lt;BR /&gt;653E4A1DF8B5708DF9C8B97587A1E981&lt;BR /&gt;Africa,Sandal,Addis Ababa,10,62819,204284,1861&lt;BR /&gt;D59E63E5319B4E3018F28D46A4CED9F9&lt;BR /&gt;Africa,Slipper,Addis Ababa,14,68641,279795,1771&lt;BR /&gt;1612FC1FE23B55078B7693ECE1E6D028&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Now here is the python code and the same output records I'm getting for that:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;import hashlib&lt;BR /&gt;&amp;nbsp;&lt;BR /&gt;filename = "f:/test/shoes.csv"&lt;BR /&gt;md5_hash = hashlib.md5()&lt;BR /&gt;with open(filename,"r") as f:&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; for x in f:&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; result=hashlib.md5(x.encode('utf-8'))&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; print (x)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; print(result.hexdigest())&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Africa,Boot,Addis Ababa,12,29761,191821,769&lt;/P&gt;&lt;P&gt;7001aaebd146b10aaed951cb692c6c4b&lt;BR /&gt;Africa,Men's Casual,Addis Ababa,4,67242,118036,2284&lt;/P&gt;&lt;P&gt;916a0c39554b70d691d03c71e8daa763&lt;BR /&gt;Africa,Men's Dress,Addis Ababa,7,76793,136273,2433&lt;/P&gt;&lt;P&gt;ea9e85e9843d3bb02206bc0ba7c3d5d4&lt;BR /&gt;Africa,Sandal,Addis Ababa,10,62819,204284,1861&lt;/P&gt;&lt;P&gt;5865cfc5d443b5a2e0038c573b5b6fb9&lt;BR /&gt;Africa,Slipper,Addis Ababa,14,68641,279795,1771&lt;/P&gt;&lt;P&gt;0226115fb928f326044ca43e186ae23a&lt;/P&gt;</description>
      <pubDate>Thu, 18 Jul 2019 07:51:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/md5-hash-issue/m-p/574476#M162336</guid>
      <dc:creator>taupirho</dc:creator>
      <dc:date>2019-07-18T07:51:01Z</dc:date>
    </item>
    <item>
      <title>Re: md5 hash issue</title>
      <link>https://communities.sas.com/t5/SAS-Programming/md5-hash-issue/m-p/574491#M162347</link>
      <description>&lt;P&gt;Update. I was thinking it might be something to do with newlines/linefeeds on the python side so changed my code to just look at the first input string in isolation.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;import hashlib

x="Africa,Boot,Addis Ababa,12,29761,191821,769"
md5_hash = hashlib.md5()
result=hashlib.md5(x.encode('utf-8')) 
print (x)
print(result.hexdigest())

Africa,Boot,Addis Ababa,12,29761,191821,769
65d38fa13c098fc3959b1eb0c19b0427

Hmmm, still doesn't match with the SAS version&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 18 Jul 2019 08:32:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/md5-hash-issue/m-p/574491#M162347</guid>
      <dc:creator>taupirho</dc:creator>
      <dc:date>2019-07-18T08:32:32Z</dc:date>
    </item>
    <item>
      <title>Re: md5 hash issue</title>
      <link>https://communities.sas.com/t5/SAS-Programming/md5-hash-issue/m-p/574503#M162350</link>
      <description>Yes, even though I was using catx which is supposed to strip spaces from beginning/end of strings, you still need that "strip" in the call to md5 function. Many thanks for your input.</description>
      <pubDate>Thu, 18 Jul 2019 09:15:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/md5-hash-issue/m-p/574503#M162350</guid>
      <dc:creator>taupirho</dc:creator>
      <dc:date>2019-07-18T09:15:39Z</dc:date>
    </item>
    <item>
      <title>Re: md5 hash issue</title>
      <link>https://communities.sas.com/t5/SAS-Programming/md5-hash-issue/m-p/574521#M162358</link>
      <description>&lt;P&gt;Just out of interest, does anyone have code that would calculate the md5 hash for a whole - potentially very large , dataset ?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Since I can calculate the hash for each record I wondered if I could just hash the hashes as it were ?&lt;/P&gt;</description>
      <pubDate>Thu, 18 Jul 2019 11:38:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/md5-hash-issue/m-p/574521#M162358</guid>
      <dc:creator>taupirho</dc:creator>
      <dc:date>2019-07-18T11:38:02Z</dc:date>
    </item>
    <item>
      <title>Re: md5 hash issue</title>
      <link>https://communities.sas.com/t5/SAS-Programming/md5-hash-issue/m-p/574522#M162359</link>
      <description>PS I'm using SAS V9.2</description>
      <pubDate>Thu, 18 Jul 2019 11:39:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/md5-hash-issue/m-p/574522#M162359</guid>
      <dc:creator>taupirho</dc:creator>
      <dc:date>2019-07-18T11:39:01Z</dc:date>
    </item>
    <item>
      <title>Re: md5 hash issue</title>
      <link>https://communities.sas.com/t5/SAS-Programming/md5-hash-issue/m-p/574527#M162361</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/281932"&gt;@taupirho&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;Yes, even though I was using catx which is supposed to strip spaces from beginning/end of strings, you still need that "strip" in the call to md5 function. Many thanks for your input.&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;catx remove leading and trailing blanks of the arguments you pass to function, but it will not strip trailing blanks from the value returned by the function - simply because all variables are padded with blanks.&lt;/P&gt;
&lt;P&gt;From the docs:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;In a DATA step, if the CATX function returns a value to a variable that has not previously been assigned a length, that variable is given a length of 200 bytes.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 18 Jul 2019 11:42:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/md5-hash-issue/m-p/574527#M162361</guid>
      <dc:creator>andreas_lds</dc:creator>
      <dc:date>2019-07-18T11:42:55Z</dc:date>
    </item>
  </channel>
</rss>

