BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
taupirho
Calcite | Level 5

Hi all I'm a SAS newbie and need some help with the following issue. I'm experimenting with the md5 hash function and testing it on the sashelp.shoes dataset.

 

In a datastep, for each input  observation,  I'm creating a new variable that is a concatenation of  each field using catx with comma as the field separator. I then use that variable as input to the md5 function and sure enough I get a hash value back which I print out to the log using PUT. I repeated this for each input line. However I then exported the shoes dataset to a CSV text  file.  The export enclosed the fields in double quotes and included $ and commas for the currency fields which I removed manually from the CSV file. So visually the the lines of the CSV looked exactly like the lines output from SAS. Next I wrote a little python script to read each line of the CSV  text file and calculate a md5 hash for that. Unfortunately none of the hashes for the CSV file matched the hashes from SAS. Has anybody done something similar and if so can you tell me where I'm going wrong. I know the python code is correct as I checked the results using the built-in md5 checker in windows.

 

When  I get into work again tomorrow I'll post some of the code I'm using if that helps. Meanwhile if anyone can help that would be appreciated

 

I'm using enterprise guide V4.2 I think

1 ACCEPTED SOLUTION

Accepted Solutions
ScottBass
Rhodochrosite | Level 12

Simplify and ensure that SAS and Python MD5 functions return the same results (AFAIK they should but you seem to be wanting to verify that):

 

data test;
   set sashelp.class (keep=name);
   hash1=md5(name);
   hash2=md5(strip(name));

   hash1x=put(hash1,hex32.);
   hash2x=put(hash2,hex32.);
run;

Also note that SAS variables are padded with spaces to the length of the variable.  Note the difference between md5(var) vs. md5(strip(var)).

 

Get SAS and Python to generate the same MD5 result for "Alfred", "Alice", "Bob", etc, then add complexity (concatenated columns) to the mix.

 

I don't know if this will help with your export issues?  https://github.com/scottbass/SAS/blob/master/Macro/export.sas


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

View solution in original post

7 REPLIES 7
ScottBass
Rhodochrosite | Level 12

Simplify and ensure that SAS and Python MD5 functions return the same results (AFAIK they should but you seem to be wanting to verify that):

 

data test;
   set sashelp.class (keep=name);
   hash1=md5(name);
   hash2=md5(strip(name));

   hash1x=put(hash1,hex32.);
   hash2x=put(hash2,hex32.);
run;

Also note that SAS variables are padded with spaces to the length of the variable.  Note the difference between md5(var) vs. md5(strip(var)).

 

Get SAS and Python to generate the same MD5 result for "Alfred", "Alice", "Bob", etc, then add complexity (concatenated columns) to the mix.

 

I don't know if this will help with your export issues?  https://github.com/scottbass/SAS/blob/master/Macro/export.sas


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
taupirho
Calcite | Level 5

As promised , here is the SAS code I'm using and the first few output records I'm getting.

 

/* concatenate all fields of a dataset and compute a checksum */

proc sql;

select name into :varstr2 separated by ','

from dictionary.columns

where libname = "SASHELP" and

memname = "SHOES";

quit;

 

 

data stuff(drop=check all);

format check $hex32.;

set sashelp.shoes end=end1;

newvar2 = catx(',',&varstr2);

all = catx(',',&varstr2);

 

check = md5(all);

put all;

put check;

run;

 

Africa,Boot,Addis Ababa,12,29761,191821,769
0F7503F59119E8248D89ED645F886871
Africa,Men's Casual,Addis Ababa,4,67242,118036,2284
8066D31E7C2A254EAB127C121B526DF7
Africa,Men's Dress,Addis Ababa,7,76793,136273,2433
653E4A1DF8B5708DF9C8B97587A1E981
Africa,Sandal,Addis Ababa,10,62819,204284,1861
D59E63E5319B4E3018F28D46A4CED9F9
Africa,Slipper,Addis Ababa,14,68641,279795,1771
1612FC1FE23B55078B7693ECE1E6D028

 

 

Now here is the python code and the same output records I'm getting for that:

 

import hashlib
 
filename = "f:/test/shoes.csv"
md5_hash = hashlib.md5()
with open(filename,"r") as f:
    for x in f:
        result=hashlib.md5(x.encode('utf-8'))
        print (x)
        print(result.hexdigest())

 

Africa,Boot,Addis Ababa,12,29761,191821,769

7001aaebd146b10aaed951cb692c6c4b
Africa,Men's Casual,Addis Ababa,4,67242,118036,2284

916a0c39554b70d691d03c71e8daa763
Africa,Men's Dress,Addis Ababa,7,76793,136273,2433

ea9e85e9843d3bb02206bc0ba7c3d5d4
Africa,Sandal,Addis Ababa,10,62819,204284,1861

5865cfc5d443b5a2e0038c573b5b6fb9
Africa,Slipper,Addis Ababa,14,68641,279795,1771

0226115fb928f326044ca43e186ae23a

taupirho
Calcite | Level 5

Update. I was thinking it might be something to do with newlines/linefeeds on the python side so changed my code to just look at the first input string in isolation.

import hashlib

x="Africa,Boot,Addis Ababa,12,29761,191821,769"
md5_hash = hashlib.md5()
result=hashlib.md5(x.encode('utf-8')) 
print (x)
print(result.hexdigest())

Africa,Boot,Addis Ababa,12,29761,191821,769
65d38fa13c098fc3959b1eb0c19b0427

Hmmm, still doesn't match with the SAS version
taupirho
Calcite | Level 5
Yes, even though I was using catx which is supposed to strip spaces from beginning/end of strings, you still need that "strip" in the call to md5 function. Many thanks for your input.
taupirho
Calcite | Level 5

Just out of interest, does anyone have code that would calculate the md5 hash for a whole - potentially very large , dataset ?

 

Since I can calculate the hash for each record I wondered if I could just hash the hashes as it were ?

taupirho
Calcite | Level 5
PS I'm using SAS V9.2
andreas_lds
Jade | Level 19

@taupirho wrote:
Yes, even though I was using catx which is supposed to strip spaces from beginning/end of strings, you still need that "strip" in the call to md5 function. Many thanks for your input.

catx remove leading and trailing blanks of the arguments you pass to function, but it will not strip trailing blanks from the value returned by the function - simply because all variables are padded with blanks.

From the docs:

In a DATA step, if the CATX function returns a value to a variable that has not previously been assigned a length, that variable is given a length of 200 bytes.

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 9246 views
  • 1 like
  • 3 in conversation