BookmarkSubscribeRSS Feed
Biswadas
Calcite | Level 5

Hi All,

I want to anonymize few names fields and the requirement is to get the new anonymized names that look real.

For example, DAVID can be changed to something like "CHARLES" (it looks real) nut not ("CFSDJN")

Is there a way in achieving this in SAS?

If not, can it be done in any other tool and technologies?

2. similarly some fields that contain the chars+digits, how to anonymize them?

The new anonymized value also should have same pattern (chars+digits maintaining the positions).

so that the end users will not know that they are anonymized and we need to maintain the integrity in all tables containing the fields, so that the values can be used for joining, primary keys etc.

 

Note: The anonymized values should be same for every new run.

11 REPLIES 11
LinusH
Tourmaline | Level 20

Funny requirement, is this a way to create test data without inventing data from scratch?

It's hard to invent "real" names etc on the fly, so I would try to pick a random name from you existing base table.

Then create a translation table, with original source value (or id) and new random name. If the same person pops up again, pick from the translation table.

If the person occurs as new, make a new translation etc.

About your other types of data, please exemplify, but I guess you could use the same technique described above for those as well.

Data never sleeps
Biswadas
Calcite | Level 5
Thanks for your reply.
I was expecting a real-like name so that the data for test users will be a
bit meaningful.

If I can keep this requirement aside (suppose I want only anonymization and
ready to accept any translated Name), what is the best way to achieve this
in SAS DI?

Similarly, how can I get the similar conversion for my 2nd requirement?
For e.g. Social security number *GF*123456XY should be changed to *YZ*987643
*AB* (by keeping the format same- 2 string-6 digits-2 strings)?

Could you please suggest?
LinusH
Tourmaline | Level 20
As already suggested, create a mapping table that maps existing values a random value picked from your existing data. Any other approach that keep data look like real data is probably much more cumbersome.
Data never sleeps
Biswadas
Calcite | Level 5

Hi All,

I want to anonymize few names fields and the requirement is to get the new anonymized names that look real.

For example, DAVID can be changed to something like "CHARLES" (it looks real) nut not ("CFSDJN")

Is there a way in achieving this in SAS?

If not, can it be done in any other tool and technologies?

2. similarly some fields that contain the chars+digits, how to anonymize them?

The new anonymized value also should have same pattern (chars+digits maintaining the positions).

so that the end users will not know that they are anonymized and we need to maintain the integrity in all tables containing the fields, so that the values can be used for joining, primary keys etc.

 

LinusH
Tourmaline | Level 20

Also, don't know what emphasis you put in the term "Anonymisation". According to GDPR, personal data can be deemed as anonymous only if it's impossible to revert back to the original value/identiy.

If you scramble values for users, it is classified as pseudonyms.

If you want strict anonymsation given your requirements, it will be harder to achive.

Data never sleeps
Biswadas
Calcite | Level 5
Thanks for your reply.
Actually, I didn't want to trace back to the original values, but only
wanted to ensure that the anonymised data looks a bit meaningful.

Could you please suggest how to achieve normal anonymization in SAS DI
(without expecting the real-like values)?
LinusH
Tourmaline | Level 20

Wihout real life like values is much easier. You can use a hasing function in a mapping expression, like

md5(your_name_column)

If really want to hide the original value, conccatenate a "salt" to your value before hasing. A salt is a contant value that is kept secret from users, so they can't decrypt the value using brute force techniques.

Data never sleeps
JackHamilton
Lapis Lazuli | Level 10

If you want to create values that look meaningful but are in fact anonymized, and you want to do it right, you need a team of Computer Science PhD's, not one person doing it as a side project.

 

There are commercial products that do what you want:

https://www.softwaretestinghelp.com/data-masking-tools/

 

 

 

 

SASKiwi
PROC Star

I'm curious to know why you need to keep your data "real" when it won't be? If you want retain the ability to join on the anonymised variables then they will need to have the same level of uniqueness as the original. I don't know any easy way of doing this while still retaining the same uniqueness. You would be better off using the SAS MD5 function to ensure your recoded values retain their uniqueness and so can still be used for joining purposes. 

AllanBowe
Barite | Level 11

To LinusH's point about making data from scratch - we have a basic macro that can help:  mp_makedata()

Currently it just adds random values, but in the future it will be updated to provide relevant data based on the primary key, and formats applied.  If you'd like to see it extended just raise an issue.

/Allan
SAS Challenges - SASensei
MacroCore library for app developers
SAS networking events (BeLux, Germany, UK&I)

Data Workflows, Data Contracts, Data Lineage, Drag & drop excel EUCs to SAS 9 & Viya - Data Controller
DevOps and AppDev on SAS 9 / Viya / Base SAS - SASjs
VincentRejany
SAS Employee

We support Data Hashing as part of  SAS code (MD5, SHA1, 256, 384, 512, CRC32

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lefunctionsref/n0b8k12x6fdw4ln1snlsqrxd53gy.h...

and SAS Federation Server

https://go.documentation.sas.com/doc/en/fedsrvmgrcdc/4.4/fedsrvmgrug/n1s7wae3mndbnmn1145y1lezasd8a.h...

 

You can also refer to this document focusing on the GDPR use case

https://communities.sas.com/t5/SAS-Communities-Library/SAS-Federation-Server-for-GDPR-Data-Masking/t...

 

Lastly, within the SAS QKB CI 32 we have been introducing specific definition for data masking:

VincentRejany_0-1639586545240.png

 

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 2772 views
  • 1 like
  • 6 in conversation