Hi All,
I want to anonymize few names fields and the requirement is to get the new anonymized names that look real.
For example, DAVID can be changed to something like "CHARLES" (it looks real) nut not ("CFSDJN")
Is there a way in achieving this in SAS?
If not, can it be done in any other tool and technologies?
2. similarly some fields that contain the chars+digits, how to anonymize them?
The new anonymized value also should have same pattern (chars+digits maintaining the positions).
so that the end users will not know that they are anonymized and we need to maintain the integrity in all tables containing the fields, so that the values can be used for joining, primary keys etc.
Note: The anonymized values should be same for every new run.
Funny requirement, is this a way to create test data without inventing data from scratch?
It's hard to invent "real" names etc on the fly, so I would try to pick a random name from you existing base table.
Then create a translation table, with original source value (or id) and new random name. If the same person pops up again, pick from the translation table.
If the person occurs as new, make a new translation etc.
About your other types of data, please exemplify, but I guess you could use the same technique described above for those as well.
Hi All,
I want to anonymize few names fields and the requirement is to get the new anonymized names that look real.
For example, DAVID can be changed to something like "CHARLES" (it looks real) nut not ("CFSDJN")
Is there a way in achieving this in SAS?
If not, can it be done in any other tool and technologies?
2. similarly some fields that contain the chars+digits, how to anonymize them?
The new anonymized value also should have same pattern (chars+digits maintaining the positions).
so that the end users will not know that they are anonymized and we need to maintain the integrity in all tables containing the fields, so that the values can be used for joining, primary keys etc.
Also, don't know what emphasis you put in the term "Anonymisation". According to GDPR, personal data can be deemed as anonymous only if it's impossible to revert back to the original value/identiy.
If you scramble values for users, it is classified as pseudonyms.
If you want strict anonymsation given your requirements, it will be harder to achive.
Wihout real life like values is much easier. You can use a hasing function in a mapping expression, like
md5(your_name_column)
If really want to hide the original value, conccatenate a "salt" to your value before hasing. A salt is a contant value that is kept secret from users, so they can't decrypt the value using brute force techniques.
If you want to create values that look meaningful but are in fact anonymized, and you want to do it right, you need a team of Computer Science PhD's, not one person doing it as a side project.
There are commercial products that do what you want:
https://www.softwaretestinghelp.com/data-masking-tools/
I'm curious to know why you need to keep your data "real" when it won't be? If you want retain the ability to join on the anonymised variables then they will need to have the same level of uniqueness as the original. I don't know any easy way of doing this while still retaining the same uniqueness. You would be better off using the SAS MD5 function to ensure your recoded values retain their uniqueness and so can still be used for joining purposes.
To LinusH's point about making data from scratch - we have a basic macro that can help: mp_makedata()
Currently it just adds random values, but in the future it will be updated to provide relevant data based on the primary key, and formats applied. If you'd like to see it extended just raise an issue.
We support Data Hashing as part of SAS code (MD5, SHA1, 256, 384, 512, CRC32
and SAS Federation Server
You can also refer to this document focusing on the GDPR use case
Lastly, within the SAS QKB CI 32 we have been introducing specific definition for data masking:
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.