Good afternoon.
I wonder if someone might be able to answer a question on a hash values? I am very new to this concept, sorry.
We are having a discussion on the security of hash values and I came up with a question.
The question is if I have a hashed value (and I know the original value) and a table of many other values (that does contain the original unhashed value someplace in the many table) can I then hash all values in the "many" table and then compare the original hash value with the hashed values from the "many" table to find the matched value?
For some reason I cannot get SAS to match the values an pull out the orignal variable value.
Thank you so much for any help.
Jeff
Example Code:
data test_code;
format hash hash_copy hash_diff $hex64.;
hash = sha256(ss_number);
hash_copy = sha256("fox");
hash_diff = sha256("fox");
if(hash=hash_copy) then
test_copy = 'T';
else test_copy = 'F';
if(hash=hash_diff) then
test_diff = 'T';
else test_diff = 'F';
run;
Original value - fox
Hash Value - 05A8475FEEBBFEA685ECE06045BE63722EDC61BB2ADE8060A9F03AED6D49F454
Hash Copy - 776CB326AB0CD5F0A974C1B9606044D8485201F2DB19CF8E3749BDEE5F36E2
Hash Diff - 776CB326AB0CD5F0A974C1B9606044D8485201F2DB19CF8E3749BDEE5F36E2
Many table
hen -
Hash Value - 05A8475FEEBBFEA685ECE06045BE63722EDC61BB2ADE8060A9F03AED6D49F454
Hash Copy - D44A08DE3981889DAE0F7BA3980EA9FD24768279FC4EC2FBBDBCF3BF9FE65C22
Hash Diff - D44A08DE3981889DAE0F7BA3980EA9FD24768279FC4EC2FBBDBCF3BF9FE65C22
pig -
Hash Value - 05A8475FEEBBFEA685ECE06045BE63722EDC61BB2ADE8060A9F03AED6D49F454
Hash Copy - F0B8C9D84DD2B877E0B952130B73E218106FEC04C23852271D390213A1FF96F4
Hash Diff - F0B8C9D84DD2B877E0B952130B73E218106FEC04C23852271D390213A1FF96F4
fox -
Hash Value - 05A8475FEEBBFEA685ECE06045BE63722EDC61BB2ADE8060A9F03AED6D49F454
Hash Copy - 776CB326AB0CD5F0A974C1B9606044D8485201F2DB19CF8E3749BDEE5F36E2
Hash Diff - 776CB326AB0CD5F0A974C1B9606044D8485201F2DB19CF8E3749BDEE5F36E2
horse -
Hash Value - 05A8475FEEBBFEA685ECE06045BE63722EDC61BB2ADE8060A9F03AED6D49F454
Hash Copy - FD62862B6DC213BEE77C2BADD6311528253C6CB3107E03C16051AA15584ECA1C
Hash Diff - FD62862B6DC213BEE77C2BADD6311528253C6CB3107E03C16051AA15584ECA1C
That makes a lot of sense then, that general approach actually how passwords are validated in most cases. You should be able to match everything up, but like I mentioned previously you need to ensure everything is hashed identically. Also if security is the driving concern, you're best off doing as @LinusH recommended and adding a salt to the plain text prior to hashing, otherwise you're much more vulnerable to brute-force attacks trying to determine the plaintext value from the hash value. GPU's today can hash at a rate of billions of hashes a second on older algorthims like MD5 and SHA1, and there's a form of attack that uses sometihng called rainbow tables that have precomputed plaintext values for various hashed values, but using a salt makes those approaches much harder. There are also alternative hashing algorithms out there specifically designed to minimize the effectiveness of GPU or other brute-force attacks, but I don't think they're implemented in Base SAS. You can also just iterate the hash multiple times to somewhat increase security, or at least increase the cost to brute-force it, but you need to make sure you doing it exactly the same on both sides or they won't match up.
Apologies.
Line 3 of the example code should read: hash = sha256("fox");
Jeff
If you click on the "cog" or "gear" icon in you post there should be an option to "Edit Reply". You can click there to make corrections or additions needed. Then people won't have to find your post with the correction. Which can be fun with this forum as we have miminal control over the order posts appear to readers and that correction could end up below many posts. Some of which may ask about that value.
Thank you for the advice.
I'm not sure I'm 100% following your question, let me see if I got this. You have a table (call it TABLE_A) that contains (at least) 2 columns, one of the columns being some text or numeric values (we'll call it column PLAIN_A) and another column which contains the hashed value of contents of column A (call it column HASH_A). Is your question can you lookup values in that table using either PLAIN_A or HASH_A as a key? If so, then yes if you know one of the values in the columns.
You need to be careful with hash functions though since they're very sensitive to how data is fed to them, make sure everything is (or isn't) quoted identically. Any change in the input to the hash function will (by design) completely change the hash value.
You stated it better than I did. Yes, I have two tables. Table A has a value PLAIN_A, and hash value HASH_A. Table B has many vaues PLAIN_B. Can I use the same hash function to hash all vaues in Table B by creating HASH_B and then compare HASH_A and HASH_B to find the PLAIN_B value in Table B?
Yes that should work, but if you have the plain values for both columns in both tables, why not just match up PLAIN_A to PLAIN_B directly? You wouldn't need to worry about hashing anything, you could just use a standard PROC SQL JOIN. You could do a join on the hashed values too once you created HASH_B, but again that seems like it's just an extra step that isn't really providing any value.
What problem are you actually trying to solve? Hashing is a great tool to have handy, but I'm not sure it's doing you any favors in this case, unless you're just trying to learn how to use it for its own sake.
That makes a lot of sense then, that general approach actually how passwords are validated in most cases. You should be able to match everything up, but like I mentioned previously you need to ensure everything is hashed identically. Also if security is the driving concern, you're best off doing as @LinusH recommended and adding a salt to the plain text prior to hashing, otherwise you're much more vulnerable to brute-force attacks trying to determine the plaintext value from the hash value. GPU's today can hash at a rate of billions of hashes a second on older algorthims like MD5 and SHA1, and there's a form of attack that uses sometihng called rainbow tables that have precomputed plaintext values for various hashed values, but using a salt makes those approaches much harder. There are also alternative hashing algorithms out there specifically designed to minimize the effectiveness of GPU or other brute-force attacks, but I don't think they're implemented in Base SAS. You can also just iterate the hash multiple times to somewhat increase security, or at least increase the cost to brute-force it, but you need to make sure you doing it exactly the same on both sides or they won't match up.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.