SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Nigel_Pain
Lapis Lazuli | Level 10

Not sure if this is the right board.

 

I was discussing hashing variables with a colleague recently. They were using SHA256 to anonymise data both in SAS and SQL Server. Interesting they got different results, which suggests a difference in the algorithm? In SAS it was called SHA_256, but in SQL Server it was SHA2_256 but my understanding is that these should be the same. The variables involved were the same length in each case.

 

It's not an area I know much about so any explanations would be gratefully received!

1 ACCEPTED SOLUTION

Accepted Solutions
Nigel_Pain
Lapis Lazuli | Level 10

It turns out that there is a distinction I didn't grasp between the SHA256 function and the HASHING function. The first produces the same results as SQL whereas the second (with the "SHA256" option) produces a hex representation of it. So what we were seeing (and I hadn't noticed) is that, when the results from SQL started "FCFF" those from the SAS HASHING function started "46434646". But the SAS SHA256 function's results did start "FCFF". It took some helpful advice from Anita at the SAS UK TS and some staring at code and output for me to realise that. Feel a bit stupid now!

View solution in original post

6 REPLIES 6
LinusH
Tourmaline | Level 20

I would also expect them do give the same result, so I would first rule out that there is something is different in the to environments.

Some kind of quick check list/quistionaire:

  • Is this true for all values, or only some?
  • How is the SQL Server values stored (column type)?
  • How do you do the comparison? Visual, or via programs?
  • For char values, make sure values are trimmed properly, sometimes trailing blanks can be stored.
  • What locales/collationg seqwuence do you use in SAS and SQL Server respectively?

If you can't explain the difference I suggest that you open a ticket to SAS tech support.

Data never sleeps
Nigel_Pain
Lapis Lazuli | Level 10

Thanks @LinusH 

  • It appears to be all values which are affected
  • I don't have direct access to SQL Server but I would think the variable is stored as varchar. Access for ODBC is translating it to char in SAS (50 characters).
  • I'm comparing with PROC COMPARE but I think my colleague was doing it visually.
  • I've tried trimming both in SAS and SQL.
  • I've asked the SQL DBAs about the SQL locale, in SAS it's EN_GB. Encoding is Latin1 (could it be that?)

Here's the code I used:

proc sql;
create table work.sashash as
select classname,hashing('sha256',trim(classname)) as hash format=$hex16000. length=8000
from am_awfnl.testclassname;
connect to odbc (dsn=am_awfnl);
create table work.sqlhash as
select * from connection to odbc (
select classname,hashbytes('SHA2_256',rtrim(classname)) as hash
from final.testclassname
);
disconnect from odbc;
quit;

proc sort data=work.sashash;
by classname;
run;

proc sort data=work.sqlhash;
by classname;
run;

proc compare base=work.sqlhash comp=work.sashash out=work.compare outall;
id classname;
var hash;
run;

Library AM_AWFNL maps to the "final" schema in the database. I applied the format and length to the SAS hash to match what came from the pass-through SQL. The values of classname were "3D", "P3" and "P5" (this was just a test table to try things). The hashed values were actually different lengths.

LinusH
Tourmaline | Level 20
Different encoding could affect text fileds (classname), but the values you you listad should not be transcoded differently. I think you've done a proper comparison, and my next step would be to contact tech support.
Data never sleeps
Nigel_Pain
Lapis Lazuli | Level 10

Thank you @LinusH and also to the legend that is @ChrisHemedinger for bring the hashing function to my attention. I'll go to TS and report back when I get an answer.

Nigel_Pain
Lapis Lazuli | Level 10

It turns out that there is a distinction I didn't grasp between the SHA256 function and the HASHING function. The first produces the same results as SQL whereas the second (with the "SHA256" option) produces a hex representation of it. So what we were seeing (and I hadn't noticed) is that, when the results from SQL started "FCFF" those from the SAS HASHING function started "46434646". But the SAS SHA256 function's results did start "FCFF". It took some helpful advice from Anita at the SAS UK TS and some staring at code and output for me to realise that. Feel a bit stupid now!

ChrisHemedinger
Community Manager

Check the SAS documentation for the full inventory of hashing functions, including the newer HASHING function itself which provides access to many of them in one function.

 

One possibility is that the SQL Server function is used with a key? That is supported by the sha256machex function in SAS.

 

data _null_;
   message = "The quick brown fox jumps over the lazy dog";
   md5 = hashing('md5',message); 
   sha1 = hashing('sha1',message);
   sha256 = hashing('sha256',message);
   sha256mac = sha256hmachex('mykey',message,0);
   put md5= / sha1= / sha256= / sha256mac=;
 run;

 

Register for SAS Innovate 2025!! The premier event for SAS users, May 6-9 in Orlando FL. Sign up now for the best deals!

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 3781 views
  • 6 likes
  • 3 in conversation