Hi, I've two datasets with a list of IDs. For privacy reasons I need to have them encrypted so as to share the data with other users and they can then use the encrypted ID to link between each other. Any suggestions?
The variable ID has 10-digit character values. Thanks in advance.
data have; input id $ 1-10;
datalines;
0000352342
0029559722
0842397036
1129250328
2920541415
3466466703
4024548491
5229643020
6154709322
7498284887
8258157100
9052514674
9611158863
;
MD5 will always result in the same hash for the same source.
Since it gives a finite number of results (2^128) for any kind of input, it is not reversible. Given some parameters for the source, though (like length 10 and "only digits"), one can easily create a reference table for all the 10^10 possible sources.
You can use the MD5 hashing function. Use the $HEX32. format to make the resulting 16-byte value readable.
Thanks a lot. Just a few questions though.
1. Would people who receive the data be able to reverse and find out the original value of ID?
2. If I would always get the same set of encrypted values running this code? Is it possible to vary? So different recipients of the data would have ID encrypted differently?
data want; set have;
Encrypted_ID = put(md5(id),$hex32.);
run;
ID Encrypted_ID
0000352342 DD392A830FAC59CD8B6EE0F091D4DB4A
0029559722 0AA9AB1440D9057354E5451BDFB6C690
0842397036 EAC6A4A7F880A7169F4FBCBC86BFC175
...
MD5 is not encryption but encoding. The same source value will always return the same encoded value. ...It's not impossible but very hard to revert the encoded value back to a clear text value.
Just concatenate something user specific to the source string to create different sets of encoded values.
data sample;
clear_text='12345';
length digest_1 digest_2 $32;
digest_1=hashing('md5',catx('|','user1',clear_text));
digest_2=hashing('md5',catx('|','user2',clear_text));
run;
proc print data=sample;
run;
MD5 will always result in the same hash for the same source.
Since it gives a finite number of results (2^128) for any kind of input, it is not reversible. Given some parameters for the source, though (like length 10 and "only digits"), one can easily create a reference table for all the 10^10 possible sources.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.