I need to anonymize some data and decided to use the sha256 function to generate anonymous ID's. I made a lookup table for myself, with the original ID and the hashed ID, and an output table with only the hashed ID for external use. The point is that only I would be able to de-anonymize the data when necessary, and the externals can give me the hash so I can find the match in the lookup table. But now, when I search for one of the hashed ID's in the lookup table, it is not found, while I can see it is there. So I cannot make the match. I replicated this behaviour in a simple example:
data have;
input ID;
datalines;
62851
62852
62853
62854
;
run;
data lookup;
set have;
format ID_ano $hex64.;
ID_ano = sha256(ID);
run;
data out (drop=ID);
set have;
format ID_ano $hex64.;
ID_ano = sha256(ID);
run;
data find;
set lookup;
where strip(ID_ano) eq "5A3558265A673F3F5E2B3F3F3F523308633F463F733F3F3F3F3F183F3F3252";
run;
In the WHERE condition, the raw values is used. Instead of using STRIP (which is dangerous, as the value could contain a leading or trailing '20'x), use PUT:
where put(ID_ano,$hex64.) eq "5A3558265A673F3F5E2B3F3F3F523308633F463F733F3F3F3F3F183F3F3252";
In the WHERE condition, the raw values is used. Instead of using STRIP (which is dangerous, as the value could contain a leading or trailing '20'x), use PUT:
where put(ID_ano,$hex64.) eq "5A3558265A673F3F5E2B3F3F3F523308633F463F733F3F3F3F3F183F3F3252";
Thanks, did it work for you? My find is still empty, not sure why though:
data find;
set lookup;
where put(ID_ano,$hex64.) eq "5A3558265A673F3F5E2B3F3F3F523308633F463F733F3F3F3F3F183F3F3252";
run;
See this complete code:
data have;
input ID;
datalines;
62851
62852
62853
62854
;
run;
data lookup;
set have;
format ID_ano $hex64.;
ID_ano = sha256(ID);
put id_ano;
run;
data out (drop=ID);
set have;
format ID_ano $hex64.;
ID_ano = sha256(ID);
run;
data find;
set lookup;
where put(ID_ano,$hex64.) eq "5A3558265A67E5B4FF5E2B91A28D52330863D846D473A28C82F7D0188CC23252";
run;
Partial log:
96 data find; 97 set lookup; 98 where put(ID_ano,$hex64.) eq "5A3558265A67E5B4FF5E2B91A28D52330863D846D473A28C82F7D0188CC23252"; 99 run; NOTE: There were 1 observations read from the data set WORK.LOOKUP. WHERE PUT(ID_ano, $HEX64.)='5A3558265A67E5B4FF5E2B91A28D52330863D846D473A28C82F7D0188CC23252'; NOTE: The data set WORK.FIND has 1 observations and 2 variables.
I added a PUT statement here:
data lookup;
set have;
format ID_ano $hex64.;
ID_ano = sha256(ID);
put id_ano;
run;
and copied the string from the log.
It has been my experience that the display of binary strings across the IOM bridge (if you work with Enterprise Guide) is not always correct, as the binary value might be converted somewhere along the way, and the HEX format is applied after that on the client side. We have had issues with incorrectly displayed UUID's (which are 16 binary bytes).
The display in the log is always right, as the resulting text is not altered by the IOM bridge.
Hi @SarahDew,
@Kurt_Bremser's suspicion of a possible character conversion seems very plausible to me.
@SarahDew wrote:
Me: 5A3558265A673F3F5E2B3F3F3F523308633F463F733F3F3F3F3F183F3F3252
Note that the above hex string looks suspicious in three ways:
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.