- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Not sure if this is the right board.
I was discussing hashing variables with a colleague recently. They were using SHA256 to anonymise data both in SAS and SQL Server. Interesting they got different results, which suggests a difference in the algorithm? In SAS it was called SHA_256, but in SQL Server it was SHA2_256 but my understanding is that these should be the same. The variables involved were the same length in each case.
It's not an area I know much about so any explanations would be gratefully received!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It turns out that there is a distinction I didn't grasp between the SHA256 function and the HASHING function. The first produces the same results as SQL whereas the second (with the "SHA256" option) produces a hex representation of it. So what we were seeing (and I hadn't noticed) is that, when the results from SQL started "FCFF" those from the SAS HASHING function started "46434646". But the SAS SHA256 function's results did start "FCFF". It took some helpful advice from Anita at the SAS UK TS and some staring at code and output for me to realise that. Feel a bit stupid now!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I would also expect them do give the same result, so I would first rule out that there is something is different in the to environments.
Some kind of quick check list/quistionaire:
- Is this true for all values, or only some?
- How is the SQL Server values stored (column type)?
- How do you do the comparison? Visual, or via programs?
- For char values, make sure values are trimmed properly, sometimes trailing blanks can be stored.
- What locales/collationg seqwuence do you use in SAS and SQL Server respectively?
If you can't explain the difference I suggest that you open a ticket to SAS tech support.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @LinusH
- It appears to be all values which are affected
- I don't have direct access to SQL Server but I would think the variable is stored as varchar. Access for ODBC is translating it to char in SAS (50 characters).
- I'm comparing with PROC COMPARE but I think my colleague was doing it visually.
- I've tried trimming both in SAS and SQL.
- I've asked the SQL DBAs about the SQL locale, in SAS it's EN_GB. Encoding is Latin1 (could it be that?)
Here's the code I used:
proc sql;
create table work.sashash as
select classname,hashing('sha256',trim(classname)) as hash format=$hex16000. length=8000
from am_awfnl.testclassname;
connect to odbc (dsn=am_awfnl);
create table work.sqlhash as
select * from connection to odbc (
select classname,hashbytes('SHA2_256',rtrim(classname)) as hash
from final.testclassname
);
disconnect from odbc;
quit;
proc sort data=work.sashash;
by classname;
run;
proc sort data=work.sqlhash;
by classname;
run;
proc compare base=work.sqlhash comp=work.sashash out=work.compare outall;
id classname;
var hash;
run;
Library AM_AWFNL maps to the "final" schema in the database. I applied the format and length to the SAS hash to match what came from the pass-through SQL. The values of classname were "3D", "P3" and "P5" (this was just a test table to try things). The hashed values were actually different lengths.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you @LinusH and also to the legend that is @ChrisHemedinger for bring the hashing function to my attention. I'll go to TS and report back when I get an answer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It turns out that there is a distinction I didn't grasp between the SHA256 function and the HASHING function. The first produces the same results as SQL whereas the second (with the "SHA256" option) produces a hex representation of it. So what we were seeing (and I hadn't noticed) is that, when the results from SQL started "FCFF" those from the SAS HASHING function started "46434646". But the SAS SHA256 function's results did start "FCFF". It took some helpful advice from Anita at the SAS UK TS and some staring at code and output for me to realise that. Feel a bit stupid now!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Check the SAS documentation for the full inventory of hashing functions, including the newer HASHING function itself which provides access to many of them in one function.
One possibility is that the SQL Server function is used with a key? That is supported by the sha256machex function in SAS.
data _null_;
message = "The quick brown fox jumps over the lazy dog";
md5 = hashing('md5',message);
sha1 = hashing('sha1',message);
sha256 = hashing('sha256',message);
sha256mac = sha256hmachex('mykey',message,0);
put md5= / sha1= / sha256= / sha256mac=;
run;